Font Size: a A A

Research On Key Technologies Of Text Detection And Recognition In Scene Images

Posted on:2023-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:1528306905481634Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text is an important means for human beings to obtain information and social communication.Reading text from images accurately is crucial to human production and life.Existing methods usually divide text reading into three sub-tasks:text detection,text recognition,and end-to-end text recognition.The purpose of text detection is to locate the text in the image.Text recognition aims to recognize the character sequence of text area.The task of end-to-end text recognition is to locate the text area and recognize the corresponding character sequence simultaneously.Text detection,text recognition and end-to-end text recognition technologies are widely used in the fields of automatic driving,commodity retrieval,text translation,etc.Automatic detection and recognition text from scene images can significantly reduce labor costs,improve work efficiency,and promote the development of information intelligence.Therefore,the research of scene text detection and recognition technology has important application value and scientific research value.However,how to accurately detect and recognize scene text still faces the following challenges.Firstly,the dense distribution of scene texts and the diversity of scales make it difficult for neighboring text to be effectively distinguished,and the large text is incomplete detected and small text is missed when detecting multi-scale text.Secondly.due to the complexity of scene text,there are some blurred texts and partial occluded texts,and it is difficult to correctly recognize these difficult samples by relying only on visual features.In addition,for the end-to-end text recognition model,there are problems of error accumulation and imbalanced training samples between text detection module and text recognition module.These two problems make it difficult for text detection module and text recognition module to be effectively optimized synchronously,thereby suppressing the overall performance of the model.Finally,the difficulty of labeling text and the arbitrariness of the reading direction hinder the further improvement of text detection and recognition performance.Therefore,this dissertation conducts research from three aspects:text detection,text recognition,and end-to-end text recognition,and has achieved the following research results.1.Center-aware text detection algorithm and scale-aware text detection algorithm are proposedThere are many densely distributed texts in scene images.Due to the lack of discriminative location information,existing methods tend to detect adjacent text instances as one instance.To solve this problem,this dissertation proposes a center-aware text detection algorithm to explicitly learn discriminative text center point information.By using the center point as a guidance to distinguish the boundary pixels of adjacent texts,dense adjacent texts can be effectively detected.However,this method will obtain incomplete boundaries when detecting dense multi-scale texts,and tends to detect large texts as multiple fragments or miss small texts,as existing methods do.Therefore,this dissertation further proposes a scale-aware semi-supervised text detection algorithm based on center-aware text detection algorithm.By designing a multi-scale feature extraction module and scale-aware loss function,the ability of the model to detect multiscale text is improved.Further,the text reconstruction process of the center aware text detection algorithm is improved to obtain more complete text boundaries.Additionally,an effective semi-supervised text detection algorithm is proposed to improve the performance of text detector by using unlabeled data.Experimental results show that the proposed methods can effectively detect dense texts and multi-scale texts,and achieve optimal performance without a large amount of labeled data.2.Text recognition algorithm based on visual and semantic mutual promotion is proposedThe background of scene text is complicated and the imaging is random,so there are many blurred texts and partial occluded texts.When recognizing blurry text and partial occluded text,only using the visual model cannot extract effective and sufficient features,so it will cause recognition errors.Although the use of language models can alleviate this problem to a certain extent,for particularly ambiguous texts and texts with non-English words,relying only on language models for correction has little effect.Therefore,this dissertation proposes an effective visual semantic mutual promotion text recognition algorithm.By aligning visual features and semantic information,and designing a visual semantic feature enhancement module to enhance and interact visual features and semantic information.Finally,the semantic information is forced to be encoded into the visual features by hiding the visual features of the characters being recognized currently.The final experimental results show that the proposed method can effectively improve the recognition ability of the model for blurry text and partially occluded text,and has achieved leading performance on six datasets.3.Hybrid attention-based and sample generation-based end-to-end text recognition algorithms are proposedIn the end-to-end text recognition system,there is a problem of error accumulation between text detection module and text recognition module.Incorrect detection results will lead to incorrect recognition results,and wrong recognition results will in turn hinder the optimization of text detection module.To solve this problem,this dissertation proposes a hybrid attention-based end-to-end text recognition algorithm.Hybrid attention can effectively eliminate noise interference in both text detection stage and text recognition stage.In addition,A novel hybrid decoder is proposed to improve the performance of text recognition module,which can avoid hindering the optimization of text detection module due to incorrect recognition results.These two measures effectively alleviate the problem of error accumulation between text detection module and text recognition module.In addition,to solve the problem of sample imbalance between text detection module and text recognition module,this dissertation further proposes a sample generation algorithm to balance the training samples of text detection module and text recognition module,and designs a text direction perception module to perceive the reading direction of text,thus effectively improving the ability of the model to recognize text in arbitrary direction.The sample generation algorithm and direction perception module can effectively improve the accuracy of text recognition,thereby further alleviating the problem of error accumulation.Finally,this dissertation proposes a novel character generation algorithm,which can generate accurate character-level labels from word-level labels,so as to guarantee the work of the sample generation algorithm.As a result,the proposed method achieves leading performance on commonly used public datasets.In summary,in order to solve the challenges of text detection,text recognition and end-to-end text recognition,this dissertation proposes corresponding solutions respectively.The experimental results show that the proposed method can effectively solve existing challenges,which fully proves the effectiveness of the proposed methods.
Keywords/Search Tags:Text Detection, Text Recognition, End-to-end Text Recognition, Weak Supervision, Semi-supervision
PDF Full Text Request
Related items