| In recent years,thanks to the rapid development of mobile Internet technology and multimedia technology and the increasing maturity of artificial intelligence technology,a large amount of text information is represented in the form of pictures,and the accurate use of computers or edge devices to automatically extract text information in images will further contribute to the development of intelligent industry.Natural scene text detection and recognition is a technology used to automatically detect text in natural scene images and identify the corresponding text content.It is widely used in automatic driving,scene analysis,smart cities,etc.With the development of deep learning and computer vision,a large number of works related to text detection and recognition in natural scenes have emerged.However,most of these works are oriented to languages such as Chinese or English,and scene text detection and recognition based on Vietnamese is "uninterested".To this end,this paper conducts related research on the detection and recognition of Vietnamese characters in natural scenes,as follows:1.This paper proposes a Vietnamese scene text detection method based on edge attention.In this method,aiming at the problem of large scale change of Vietnamese text target in natural scene,this paper proposes receptive field residual block(RFRB),which adopts dilated convolution with different expansion rates and forms rich receptive fields to adapt to multi-scale Vietnamese scene text target;Considering the need for more robust and rich features in Vietnamese scene text detection,especially some low-level feature information of text,this paper proposes a multi-channel fusion feature pyramid network(MF-FPN).Each feature map output contains rich feature information,which is more helpful to detect diacritical symbols;Aiming at the problem of false detection caused by the interference of background information and the influence of diacritic symbols,this paper proposes to design Re-Score mechanism to increase the scoring branch of semantic sequence,so as to make the confidence of target category more accurate,and then eliminate false positive targets;Aiming at the problem that the Vietnamese scene text target cannot be completely detected due to the small shape and easy to be ignored of the diacritic symbols in the Vietnamese text target compared with the Latin alphabet in the natural scene,this paper designs the edge attention mechanism(EAM),which multiplies the predicted target edge probability map and the intermediate features of the predicted branch to form edge attention,so as to guide the predicted branch of the model to pay more attention to the target edge,and then accurately segment the text target in Vietnam scene.In this paper,the detection performance of this method is verified by sufficient ablation experiments and comparative experiments.2.This paper proposes a Vietnamese scene character recognition method based on sequence modeling feature.In this method,facing the serious problem of attention drift in Vietnamese scene character recognition and the challenge that voice change symbols are easy to be ignored in recognition,this paper proposes the visual feature and sequence feature fusion module(VSFM),which enhances the timing of attention map by using Bi-GRU to model the sequence features in the water square and vertical directions respectively,so as to effectively alleviate the attention drift and enhance the connection between diacritic symbols and Latin letters;Faced with the problem that there are many categories of characters to be recognized,the difference between characters is small(close to characters),and it is difficult to recognize,this paper designs an Enhanced Decoupled Text Decoder,which integrates the vertical sequence modeling features and visual features in the classification,so as to further enhance the attention to diacritical symbols and the character discrimination ability of the classifier.Through sufficient experiments,this paper explores the effectiveness of VSFM and Enhanced Decoupled Text Decoder and the recognition performance of this method. |