Font Size: a A A

A Research On Detection Methods For Scene Text In Natural Videos

Posted on:2020-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2428330575955067Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the acquisition,storage and processing technolo-gies for digital media and the increasingly extensive application demands from users,the amount of video data concerning natural scenes has been growing fast,which poses urgent demands for new methods and techniques for extracting useful information from videos.As an important type of visual objects in natural scene images and videos,text carry rich and important semantic information about the video content,which is of significant values to the research and applications on analysis,classification,summa-rization,retrieval,recommendation of video data,and therefore has attracted research interest of computer vision,pattern recognition,image processing communities.Mean-while,despite sharing similarly diverse and complicated appearance and contextual in-terferences like complex background and object occlusion with text in static natural scene images,video text exhibit some distinct properties such as blurring and degrada-tions caused by motions,varied viewpoints and resolutions,and temporal correlation and redundancy of text cues.These characteristics bring extra difficulty to the detec-tion,while on the other hand can also be exploited to assist the detection.However,compared to research on text detection in static images,there exist much less research results for video text detection with relatively simple technical schemes.This thesis conducts in-depth research on scene text detection in videos,and proposes two novel and effective video text detection methods.In view of correlations of text cues across adjacent video frames,this thesis pro-poses a video text detection method combining efficient intra-frame text detection and effective cross-frame tracking of text.The method first combines a Convolutional Neu-ral Network(CNN)based classifier that operates on connected components with a deep neural network text detector SegLink to improve the accuracy of the resulting text re-gion candidates.Then,the thesis filters the text region candidates exploiting the corre-lations between text and its background with a bipartite graph model and the random walk algorithm,which reduces the likelihood of false text candidates and increases the likelihood of text candidates supported by corresponding background.Next,the thesis devises two effective text tracking algorithms,one based on motion estimation and the other based on correlation filters,and combines the tracking and the detection results based on analysis of text tracking trajectories,which improves both precision and recall of the detection results.Different from the above tracking-based late fusion scheme for video text detec-tion,this thesis further proposes a novel end-to-end deep neural network model for detecting scene text in the video,which adopts an adaptive early fusion scheme for fea-tures extracted from multiple neighboring frames,so as to overcome the insufficiency of the late fusion scheme for global optimization and reliable detection.The model first exploits a multi-level network structure with deformable convolution block to hi-erarchically extract and combine features of different spatial resolutions and semantic ion levels from every frame.Next,the model spatiotemporally samples com-plementary features from adjacent frames with a multi-scale deformable convolution structure,and then aggregates the features into an augmented feature representation with an attention-based weighting mechanism,which is further fed to a regression-based text prediction module for localizing text in the video frame.The thesis evaluates the performances of the proposed video text detection meth-ods by experiments on several public scene text video datasets.The experiment results show that,compared to existing approaches that mostly adopt early fusion mechanism and tracking measures,the proposed methods achieve better detection performances,which demonstrate their effectiveness in video text detection.
Keywords/Search Tags:Scene text, video, text detection, background, deep neural network, multi-frame integration, tracking
PDF Full Text Request
Related items