Font Size: a A A

Scene Text Detection From Scene Images And Videos

Posted on:2019-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Y PeiFull Text:PDF
GTID:1318330548957867Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computer vision is a very popular research field in recent years,in which image semantic analysis is one of the most important issues.In the literature,some research shows that the semantic information from text contains more than 70%of the whole image,so it is important to extract and utilize the image semantic information by detecting and recognizing text from images and videos.Optical Charcter Recognition(OCR)from printed documents and books has been studied very well.But in natural scenes,the text usually appears in very non-standard form with noises and deformations,where the traditional OCR technology has very limited performance for extracting and recognizing scene text.As the development of technologies,especially the mobile technology,people are no longer satisfied with the recognition of the simple printed text,but to the natural scenes.As the foundermental task,people put forward many novel methods about the text detection in natural scenes,but so far,the multi-orientation text detection in the natural scene problem still has the challenges:Firstly,the character detection is not accurate.Due to the complexity of the natural scene,the characters in the image are varied,there are many hardly-detected characters.Secondly,the text noise is difficult to filter out.In the natural scenes,there are many text-like areas,the detector often get the error answer in the judgment of these areas.Thirdly,the text direction is difficult to determine.The texts may be arranged in any way,and for some languages,like Chinese and Japanese,a character can be separated into many regions,which will make it difficult in judging the text directions.In response to these problems,this paper conducted a series of innovative research on the key technologies of text detection in horizontal direction,multi-directional text detection and video text detection in natural scenes.At first,we focus on the problems of character extraction and the text discrimination,and propose a scene text detection method with two novel works those are the multi-information fusion character extraction and the multi-classifier text filter.The character extraction based on multi-information fusion clusters the connected components by hierarchical clustering algorithm,and then uses the overall characteristics of the connected domain in the cluster to fuse the connected domains of multiple channels to maximize the retention of the characters.The experiments on ICDAR dataset show that the recall of characters improves from 92%up to 98%in comparison of using the character fusion method compared to the one in gray channel.The multi-classifier fusion text discriminating algorithm can distinguish the text candidates with high precision by fusing a plurality ofdifferent discriminators,and the discriminator based on the CNN sliding window has obvious effect on the filtering of the class text area.Then,in the multi-orientation text detection,we propose an adaptive clustering algorithm based on the metric learning framework,and use this algorithm to design a coarse-to-fine multi-orientation text candidate construction the algorithm.We apply this metric learning method to both single-link clustering and binary clustering algorithm,and have a good result.In the process of multi-orientation text line construction,we propose to determine the direction of the text line by using the method of morphological clustering,direction clustering and intercept clustering in turn.The method is evaluated on multiple public datasets ICDAR15,MSRA-TD500,USTB-SV1K,and have reached the state-of-the-art performance at that time.In real scenes,it is an impossible mission that detection the obstructed texts in one image.To solve this problem,we turn our sight from the static images to the dynamic videos and expact to use the help of the Spatio-temporal continuity information in videos.Therefore,the third contribution of this project is that we present a video text tracking detection method based on energy minimization and optimization algorithm based on the Spatio-temporal continuity in videos and second-order features of the text.By adding the exclusion energy model,we use the relationship between the target and the target to extract the second order feature of the text,and enhance the ability of the model to judge the similar text.In the ICDAR15 dataset for evaluation,MOTA(Multiple Object Tracking Accuracy)value than have significantly improved other methods,indicating that the method has a good effect in preventing the target ID switching.Compared with the simple detection method,the detection system combined with the tracking technology has the higher robustness to the text detection in different scenes.
Keywords/Search Tags:Scene Text Detection, Multi-orientation Text Detection, Text Tracking, Energy Model
PDF Full Text Request
Related items