Abstract.
This paper focuses on the problem of reduction of the computation load for road scene text recognition by making a stopping decision which cuts off further recognition. The contribution of the paper is the construction of stopping rules for real-time text recognition systems with results combination, with an experimental evaluation on an open dataset RoadText-1k. We found that for fast-working systems the ROVER (Recognizer Output Voting Error Reduction) combination method and majority voting are best for Levenshtein and direct match metrics respectively, however, with an increase of per-frame processing time, ROVER becomes consistently better. Furthermore, while the selection of a single most focused frame is the worst strategy for fast-working systems, its comparative rank increases with the increase of processing time. Moreover, choosing one most focused frame and combining three most focused frames are preferable for fast-working systems when decreasing load on the device is needed.
Keywords:
Combination method, Reducing computational load, Real-time recognition, Road scene analysis, Text recognition, Video stream recognition.
DOI 10.14357/20718632240301
EDN MMVTBM
PP. 3-15.
References
Yuan-Ying Wang, Hung-Yu Wei, "Road Capacity and Throughput for Safe Driving Autonomous Vehicles", IEEE Access, 2020, vol. 8, pp. 95779–95792, 10.1109/ACCESS.2020.2995312. 2. Paden B., Čáp M., Zheng Yong S., Yershov D., Frazzoli E., "A survey of motion planning and control techniques for self-driving urban vehicles", IEEE Transactions on Intelligent Vehicles, vol. 1, 1998, pp. 33–55, 10.1109/TIV.2016.2578706. 3. Chen Z., Huang X., "End-to-end learning for lane keeping of self-driving cars", 2017 IEEE Intelligent Vehicles Symposium (IV), 2017, pp. 1856–1860, 10.1109/IVS.2017.7995975. 4. Gündüz G., Acarman A. T., "A Lightweight Online Multiple Object Vehicle Tracking Method", 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 427–432, 10.1109/IVS.2018.8500386. 5. Matsuda A., Matsui T., Matsuda Y., Suwa H., Yasumoto K., "A System for Real-time On-street Parking Detection and Visualization on an Edge Device'", 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 2021, pp. 227–232, 10.1109/PerComWorkshops51409.2021.9431076. 6. Balamuralidhar N., Tilon S., Nex F., , 2021. "MultEYE: Monitoring system for real-time vehicle detection, tracking and speed estimation from UAV imagery on edge-computing platforms", Remote sensing, 2021, 13(4), p.573, 10.3390/rs13040573. 7. Zhu Z., Liang D., Zhang S., Huang X., Li B., Shimin Hu, "Traffic-Sign Detection and Classification in the Wild", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 10.1109/CVPR.2016.232. 8. Konushin A.S., Faizov B.V., Shakhuro V.I., "Road images augmentation with synthetic traffic signs using neural networks", Computer Optics, vol. 45, 2021, pp. 736–748, 10.18287/2412-6179-CO-859. 9. Rajesh R., Rajeev K., Suchithra K., Lekhesh V.P., Gopakumar V., Ragesh N.K., "Coherence vector of Oriented Gradients for traffic sign recognition using Neural Networks", The 2011 International Joint Conference on Neural Networks, 2011, 10.1109/IJCNN.2011.6033318. 10. Lobanov M., Sholomov D., "On the Acceleration of the Convolutional Neural Network Architecture Based on Res-Net in the Task of Road Scene Objects Recognition", Journal of Information Technologies and Computing Systems, 2019, vol. 69, pp. 57–65. 11. Limonova E. E., Alfonso D. M., Nikolaev D. P., Arlazarov V. V., "Bipolar Morphological Neural Networks: Gate-Efficient Architecture for Computer Vision", IEEE Access, vol. 9, pp. 97569–97581, 2021, doi: 10.1109/ACCESS.2021.3094484. 12. Bojarski M., Testa D., Dworakowski D., Firner B., Flepp B., Goyal P., Jackel L. D., Monfort M., Muller U., Zhang J., Zhang X., Zhao J., Zieba K., "End to end learning for self-driving cars", Retrieved from https://arxiv.org/abs/1604.07316, 2016, Accessed August 4, 2022. 13. Naiemi F., Ghods V., Khalesi H., "Scene text detection and recognition: a survey", Multimedia Tools and Applications, 2022, vol. 81, 10.1007/s11042-022-12693-7. 14. Reddy S., Mathew M., Gomez L., Rusinol M., Karatzas D., Jawahar C.V., "RoadText-1K: Text Detection amp; Recognition Dataset for Driving Videos", 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 11074–11080, 10.1109/ICRA40945.2020.9196577. 15. Bulatov K., Fedotova N., Arlazarov V. V., "An approach to road scene text recognition with per-frame accumulation and dynamic stopping decision", Thirteenth International Conference on Machine Vision, 2021, 10.1117/12.2586912. 16. Bulatov K., Razumnyi N., Arlazarov V.V., "On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model", Int. J. on Document Analysis and Recognit, 2019, vol. 22, number 3, pp. 303–314, 10.1007/s10032-019-00333-0. 17. Bulatov K., "A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives", Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software, 2019, vol. 12, number 3, pp. 74–88, 10.14529/mmp190307. 18. Bulatov K., Arlazarov V. V., "Determining optimal frame processing strategies for real-time document recognition systems", Document Analysis and Recognition – ICDAR 2021, Lecture Notes in Computer Science, vol. 12822, 2021, 10.1007/978-3-030-86331-9_18. 19. Fiscus J. G., "A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)", 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997, pp. 347–354, 10.1109/ASRU.1997.659110. 20. Petrova O., Bulatov K., Arlazarov V. L., "Methods of weighted combination for text field recognition in a video stream", Proc. SPIE (ICMV 2019), 2020, vol. 11433, pp. 704–709, 10.1117/12.2559378. 21. Tolstov I., Martynov S., Farsobina V., Bulatov K., "A modification of a stopping method for text recognition in a video stream with best frame selection", Proc. SPIE (ICMV 2020), 2021, vol. 11605, pp. 464–471, 10.1117/12.2586928. 22. Mita T., Hori O., "Improvement of Video Text Recognition by Character Selection", Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001, pp. 1089–1093, 10.1109/ICDAR.2001.953954. 23. Czúni L., Nagy A. M., "Improving object recognition of CNNs with multiple queries and HMMs", Twelfth International Conference on Machine Vision (ICMV 2019), 2020, vol. 11433, pp. 266–272, 10.1117/12.2559393. 24. Bulatov K. B., Polevoy D. V., "Reducing overconfidence in neural networks by dynamic variation of recognizer relevance", ECMS, 2015, pp. 488–491. 25. Bulatov K., Fedotova N., Arlazarov V. V., "Fast Approximate Modelling of the Next Combination Result for Stopping the Text Recognition in a Video'', 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 239–246, 10.1109/ICPR48806.2021.9412574. 26. Zilberstein S., "Using Anytime Algorithms in Intelligent Systems", AI Magazine, 1996, vol. 17, number 3, pp. 73–83, 0.1609/aimag.v17i3.1232. 27. Yujian L., Bo L., "A Normalized Levenshtein Distance Metric", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, vol. 29, number 6, pp. 1091–1095, 10.1109/TPAMI.2007.1078.
|