Font Size: a A A

Research On The Method Of Video Text Detection And Online Tracking In Natural Scene

Posted on:2021-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:J P MeiFull Text:PDF
GTID:2518306107467984Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Text is one of the greatest inventions of mankind.It carries rich semantic information and plays a pivotal role in people’s lives.With the rapid development of mobile Internet technology and the popularization of electronic devices,people have entered an era of big data,and the Internet is flooded with massive video data.The text information in videos can often express the key content of the video more accurately.The text plays a vital role in image processing and video analysis.Therefore,how to accurately and efficiently extract and analyze text information from video data has become a trend in recent years.Natural scene text is more challenging than scanned documents.Its background is complex and diverse,and multiple languages are mixed.The text area may be deformed,incomplete,and blurred.The scene text in the video is more problematic such as motion blur and defocus.Existing text detection and tracking algorithms are usually divided into two parts: first,a text detector is established to detect the text in the video,and then a tracking frame is built,and the detection results are used to correlate the data of the front and back frames with the tracking algorithm.These methods make the whole system add extra computational cost,and separate the two tasks of detection and tracking,without using each other’s supervision information.Based on this,this thesis conducts an in-depth study on text detection and online tracking in video by combining the methods of convolutional neural network and Hungarian matching and Kalman filtering.The main work of this thesis is as follows:(1)An efficient multi-directional text detection network is designed for scene video text detection.The network generates feature maps with different resolutions based on the feature pyramid structure and performs feature fusion.The anchor with rotation angles are obtained through neural network learning,and then the anchor optimized by the neural network are used as benchmarks on the feature map.Match any text boxes that may exist.By cascading directly predicting the category of each pixel and the offset from the anchor,a feature refinement module is designed to align the output box of the first level with the feature points.Experiments show that the detector can detect horizontal and multidirectional text and can run with near real-time efficiency.It has achieved an F score of 81.6 on the ICDAR2015 dataset,which is superior to the existing one-stage text detector.(2)The video of the scene text often has the phenomenon of blur,out of focus,occlusion,etc.In this thesis,the feature map of the previous frame is input into the designed feature adjustment network to obtain the feature map adjusted from the previous frame to the current frame,and then the adjusted feature map is calculated The correlation weight with the feature map of the current frame is normalized,and finally the weight is used to weight the spatial position of each pre-order feature map to enhance the feature of video text detection.(3)An end-to-end trainable video text detection and online tracking framework is proposed.The tracking model and the detection network share the weight of the feature extraction network.The tracking representation model branch is added directly on the basis of the detection network,and the trained representation model is added.Fusion with the detection results,in addition to combining the characteristics of the text,combining the feature representation of the text instance with the semantic information of the text,as a comparison basis for the text data association,using online tracking,using the Hungarian algorithm for the front and back frames Data for data association.The end-to-end detection and tracking framework reduce the feature extraction of tracking branches and makes full use of each other’s supervision information.The entire system achieves a good balance between speed and accuracy.
Keywords/Search Tags:Text detection, Neural networks, Online tracking
PDF Full Text Request
Related items