INTELLIGENCE SYSTEMS AND TECHNOLOGIES
A. V. Gayer Computationally Efficient Detection of the Region of Interest of the Russian Passport in the Image
COMPUTING SYSTEMS AND NETWORKS
MATHEMATICAL MODELLING
DATA PROCESSING AND ANALYSIS
MANAGEMENT AND DECISION MAKING
A. V. Gayer Computationally Efficient Detection of the Region of Interest of the Russian Passport in the Image
Abstract. 

The article addresses the task of localizing a Russian Federation citizen's passport in photographs where the document occupies a small portion of the frame. This problem is particularly relevant for remote verification systems that require users to upload a selfie with their passport. The small scale complicates document recognition and localization due to lower resolution. To improve localization accuracy, an ultra-lightweight neural network model, YOLO-Passport, is proposed for passport region localization, reducing the problem to a fixed document scale. Compared to compact YOLO detectors, YOLO-Passport has an order of magnitude fewer operations and parameters. The proposed approach increased the detection recall of Russian passports from 91.6% to 97.4%. The model's inference time on a CPU is 3 ms, and its size in 8-bit format is only 340 KB, making it efficient for deployment in industrial systems and WASM-based web applications.

Keywords: 

document recognition, deep learning, object detection, YOLO.

DOI 10.14357/20718632250301

EDN AEBPRY

PP. 3-12.

References

1. Arlazarov, V.L., Slavin, O.A.: Issues of recognition and verification of text documents. ITiVS 3, 55–61 (2023), doi: 10.14357/20718632230306.
2. Paliwal, R., Yadav, S., Nain, N. (2020). FaceID: Verification of Face in Selfie and ID Document. In: Nain, N., Vipparthi, S., Raman, B. (eds) Computer Vision and Image Processing. CVIP 2019. Communications in Computer and Information Science, vol 1148. Springer, Singapore. doi: 10.1007/978-981-15-4018-9_40.
3. R. Reyes, B. Peralta, O. Nicolis and L. Caro, "A Proposal for Deep Online Facial Verification using Selfies and Id document," 2022 IEEE International Conference on Automation/XXV Congress of the Chilean Association of Automatic Control (ICA-ACCA), Curicó, Chile, 2022, pp. 1-6, doi: 10.1109/ICA-ACCA56767.2022.10006244.
4. ICAO Doc 9303 (Eighth Edition) Part 4: Specifications for Machine Readable Passports (MRPs) and other TD3 Size MRTDs, Machine Readable Travel Documents. International Civil Aviation Organization. — 2021.
5. J. Llados, F. Lumbreras, V. Chapaprieta and J. Queralt, "ICAR: Identity Card Automatic Reader," Proceedings of Sixth International Conference on Document Analysis and Recognition, Seattle, WA, USA, 2001, pp. 470-474, doi: 10.1109/ICDAR.2001.953834.
6. D. P. Matalov, S. A. Usilin, D. P. Nikolaev and V. V. Arlazarov, “Application of Cascade Methods as a Universal Object Detection Tool,” Pattern Recognit. Image Anal., vol. 33, no 4, pp. 685-698, 2023, doi: 10.1134/S1054661823040302.
7. D. V. Tropin, A. M. Ershov, D. P. Nikolaev and V. V. Arlazarov, “Advanced Hough-based method for on-device document localization,” Computer Optics, vol. 45, no 5, pp. 702-712, 2021, doi: 10.18287/2412- 6179-CO-895.
8. D. V. Tropin, I. A. Konovalenko, N. S. Skoryukina, D. P. Nikolaev and V. V. Arlazarov, “Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio,” ICMV 2020, 11605 ed., Bellingham, Washington 98227-0010 USA, Society of Photo-Optical Instrumentation Engineers (SPIE), Jan. 2021, vol. 11605, ISSN 0277-786X, ISBN 978-15-10640-40-5, vol. 11605, 116051F, pp.116051F1-116051F9, 2021, doi: 10.1117/12.2587029.
9. K. Javed and F. Shafait, "Real-Time Document Localization in Natural Images by Recursive Application of a CNN," 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2017, pp. 105-110, doi: 10.1109/ICDAR.2017.26.
10. Zhu, A., Zhang, C., Li, Z. et al. Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement. IJDAR 22, 351–360 (2019). doi: 10.1007/s10032-019-00341-0.
11. N. S. Skoryukina, D. V. Tropin, Y. A. Shemiakina and V. V. Arlazarov, “Document Localization and Classification As Stages of a Document Recognition System,” Pattern Recognit. Image Anal., vol. 33, no 4, pp. 699-716, 2023, doi: 10.1134/S1054661823040430.
12. J. Shemiakina, I. Konovalenko, D. Tropin and I. Faradjev, “Fast projective image rectification for planar objects with Manhattan structure,” ICMV 2019, 11433 ed., Wolfgang Osten, Dmitry Nikolaev, Jianhong Zhou, Ed., Bellingham, Washington 98227-0010 USA, Society of Photo-Optical Instrumentation Engineers (SPIE), Jan. 2020, vol. 11433, ISSN 0277-786X, ISBN 978-15-10636-44-6, vol. 11433, pp. 114331N1-114331N9, 2020, doi: 10.1117/12.2559630.
13. A. M. Awal, N. Ghanmi, R. Sicre and T. Furon, "Complex Document Classification and Localization Application on Identity Document Images," 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 2017, pp. 426-431, doi: 10.1109/ICDAR.2017.77.
14. Yaqiang Wu, Zhen Xu, Yong Duan, Yanlai Wu, Qinghua Zheng, Hui Li, Xiaochen Hu, and Lianwen Jin. 2024. RDLNet: A Novel and Accurate Real-world Document Localization Method. In Proceedings of the 32nd ACM International Conference on Multimedia (MM '24). Association for Computing Machinery, New York, NY, USA, 9847–9855. doi: 10.1145/3664647.3681655.
15. Chiron, G., Arrestier, F., Awal, A.M. (2021). Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping. In: Lladós, J., Lopresti, D., Uchida, S. Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science, vol 12824. Springer, Cham. doi: 10.1007/978-3-030-86337-1_23.
16. M. Tan, R. Pang and Q. V. Le, "EfficientDet: Scalable and Efficient Object Detection," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10778-10787, doi: 10.1109/CVPR42600.2020.01079.
17. J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
18. R. Mulajkar and S. Yede, "YOLO Version v1 to v8 Comprehensive Review," 2024 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 2024, pp. 472-478, doi: 10.1109/ICICT60155.2024.10544452.
19. Gioi, Rafael & Jakubowicz, Jeremie & Morel, Jean-Michel & Randall, Gregory. (2010). LSD: A Fast Line Segment Detector with a False Detection Control. IEEE transactions on pattern analysis and machine intelligence. 32. 722-32. doi: 10.1109/TPAMI.2008.300.
20. X. Lin, Y. Zhou, Y. Liu and C. Zhu, "A Comprehensive Review of Image Line Segment Detection and Description: Taxonomies, Comparisons, and Challenges," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 8074-8093, Dec. 2024, doi: 10.1109/TPAMI.2024.3400881.
21. Bulatov, K.B., Emelyanova, E.V., Tropin, D.V., Skoryukina, N.S., Chernyshova, Y.S., Sheshkus, A.V., Usilin, S.A., Ming, Z., Burie, J.C., Luqman, M.M., Arlazarov, V.V.: Midv-2020: A comprehensive benchmark dataset for identity document analysis. Computer Optics 46(2), 252–270 (2022), doi: 10.18287/2412-6179-CO-1006.
22. Performance Analysis of the YOLOv8 Model. Online Resource: https://habr.com/ru/articles/822917/ (access date: 13.06.2025).
23. Performance Comparison of YOLO Object Detection Models – An Intensive Study. Online Resource: https://learnopencv.com/performance-comparison-of-yolo-models/ (access date: 13.06.2025).
24. Ultralytics YOLO11. Online Resource: https://docs.ultralytics.com/ru/models/yolo11 (access date: 13.06.2025).
25. Object Detection using YOLOv5 OpenCV DNN in C++ and Python. Online Resource: https://learnopencv.com/object-detection-using-yolov5-and-opencv-dnn-in-c-and-python/#Inference-with-YOLOv5 (access date: 13.06.2025).
26. Ultralytics. Online Resource: https://github.com/ultralytics (access date: 13.06.2025).
27. A. V. Gayer, A. V. Sheshkus and Y. S. Chernyshova, “Augmentation on the fly for the neural networks learning,” Trudy ISA RAN (Proceedings of ISA RAS), vol. 68, special issue №S1, pp. 150-157, 2018, doi: 10.14357/20790279180517.

2025 / 04
2025 / 03
2025 / 02
2025 / 01

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".