|
Abstract.
Design patterns play a key role in software development, representing best practices that enhance code maintainability and understandability. Their identification in source code is essential for the analysis and maintenance of legacy systems. Modern large language models (LLMs) trained on code introduce new approaches to the automatic detection of design patterns. However, the impact of different code representations on classification accuracy using LLMs remains underexplored. This study evaluates the performance of classifiers trained on embeddings generated by CodeT5, DeepSeek-Coder, and LLaMA (7B and 13B), using the DPD-Att dataset (14 categories, including “Unknown”). CodeT5 embeddings yield the highest and most stable results (up to 85% accuracy), while DeepSeek-Coder and LLaMA demonstrate competitive but less consistent performance.
Keywords:
code embeddings, code analysis, code representation, pattern recognition, design pattern detection.
DOI 10.14357/20718632250402
EDN PQBJOI
PP. 17-28.
References
1. Suresh, S., Reddy, A.R., and Sharma, A. A review on detection of design pattern in source code using machine learning techniques. In Proceedings of the 4th International Conference on Artificial Intelligence and Smart Energy (ICAIS). Springer, 2024. pp. 75–85. Available at: https://link.springer.com/chapter/10.1007/978-3-031-88188-6_7 2. Thaller, B., and Wotawa, F. Machine learning-based design pattern detection using source code features. arXiv preprint, 2018. arXiv:1812.09873. Available at: https://arxiv.org/pdf/1812.09873 3. Wang, Y., Liu, S., Tan, L., et al. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint, 2021. arXiv:2109.00859. Available at: https://ar5iv.org/pdf/2109.00859.pdf 4. Nguyen, D., and Tran, A.B. Design pattern recognition: A study of large language models. Empirical Software Engineering, 2025. vol. 30, article 10625. Available at: https://link.springer.com/article/10.1007/s10664-025-10625-1 5. Touvron, H., Lavril, T., Izacard, G., et al. LLaMA: Open and efficient foundation language models. arXiv preprint, 2023. arXiv:2302.13971. Available at: https://arxiv.org/abs/2302.13971 6. Nour, M., and Elbarougy, R. Large language models (LLMs) for source code analysis: applications, models and datasets. arXiv preprint, 2025. arXiv:2503.17502. Available at: https://arxiv.org/html/2503.17502v1 7. DeepSeek-AI. DeepSeek Coder: Let the code write itself. GitHub repository, 2023. Available at: https://github.com/deepseek-ai/DeepSeek-Coder 8. Preetham, S., and Reddy, A.R. Detecting design patterns from source code using static analysis techniques. ResearchGate, 2018. Available at: https://www.researchgate.net/publication/326907290_Detecting_Design_ Patterns_from_Source_Code_using_Static_Analysis_Techniques 9. Nanda, A., and Kaushik, R. Feature-based software design pattern detection. arXiv preprint, 2020. arXiv:2012.01708. Available at: https://arxiv.org/pdf/2012.01708 10. Ivanov, V.V., and Khramtsov, A.V. Detecting design patterns in Android applications with CodeBERT embeddings and CK metrics. OpenReview, 2024. Available at: https://openreview.net/forum?id=PX4DP0hFFq 11. Feng, Z., Guo, D., Tang, D., et al. CodeBERT: A pretrained model for programming and natural languages. arXiv preprint, 2020. arXiv:2002.08155. Available at: https://arxiv.org/abs/2002.08155 12. Mzid, Rania, Ilyes Rezgui, and Tewfik Ziadi. "Attentionbased Method for Design Pattern Detection." In European Conference on Software Architecture, pp. 86-101. Cham: Springer Nature Switzerland, 2024. 13. E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Boston, MA: Addison-Wesley, 1994. 14. Jameleh. PD-detection-LLM. GitHub repository. Available at: https://github.com/Jameleh/PD-detection-LLM
|