Font Size: a A A

Research On Multi-task Cascade Scene Text Detection

Posted on:2020-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:X B GuoFull Text:PDF
GTID:2428330611999749Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text,as an important carrier of information transmission,can express advanced semantic information.Scene text is an important form of expression that appears in our daily life.In recent years,scene text detection has become one of the most high-profile research directions in the field of computer vision and document analysis,and it has received in-depth attention from academia and industry.Scene text detection is an important part of scene OCR.As an important technology,it has been widely adopted into the tasks such as license plates detection and recognition,card text detection and recognition,scene text sentiment analysis,driverless technology and so on.With the advancement of deep learning,scene text detection based on deep learning methods have gradually become the mainstream in this field.Although scene text detection methods based on deep learning have been significantly improved,due to the complexity of the scene text s,the existing detection methods still have limitations.Compared with the general document,scene texts usually show complex backgrouds,varied illumination,and varied scales and aspect ratios,let alone the challenges from text itself,such as text varieties(the scene text has different languages,each language contains multiple text types),the arranged direction(including horizontal,multi-directional,curved,etc.),and the cluttered text visual features(multiple text categories,few structural commonalities).To alleivate the problems above,this paper analyzes the method based on detection and segmentation,and study the methods from feature extraction and feature fusion methods,to build the model via multi-tasking cascade methods.Upsampling feature fusion and auxiliary regression is mainly used to deal with multi-oriented text detection problems.For the part of scene text feature extraction and feature fusion,upsampling feature fusion method is adopted,the multi-branch context module is used to extract the discriminative text features,which is beneficial to gather regional proposals;For the part of multi-task cascade method,the main idea is to combine a coordinate-aligned scene text detection method and an instance segmentation method,where an auxiliary regression method based on center points and corner points is added on top of segmentation branch to improve the accuracy of multi-oriented text detection.Feature pyramid network fusion can be used to solve both multi-oriented and curved text detection problems.It uses a stronger feature extraction and fusion method,named feature pyramid network,to make the high-level semantic information and the low-level position information effectively combined,and the corresponding scale targets at each feature level effectively extracted,so that subsequent classification,regression and segmentation tasks can acquire enough features.For the part of multi-task cascade,the general instance segmentation method is adopted,but the online hard example mining and synchronized batch normalization methods are used in the model training,which make s the model convergence enhanced greatly.Through experiments,the effects of the two solutions on feature extraction are compared.Experiments show that the second solution has stronger feature fusion ability.Experiments on multi-oriented,multi-language and curved text datasets demonstrate that the two multi-task cascade scene text detection models proposed in this work have achieved the state-of-the-art performance on multiple scene text detection datasets.
Keywords/Search Tags:scene text detection, feature fusion, multi-task cascade, instance segmentation
PDF Full Text Request
Related items