Font Size: a A A

Research And Implementation Of Text Recognition In Video

Posted on:2020-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:J L TangFull Text:PDF
GTID:2428330596475075Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,video and image have become the main data resources in life.The text information has high-level semantic information,which is the most direct explanation of video content,can be used for video retrieval,classification and understanding.Image text recognition has attracted a lot of attention in computer vision,so many considerable algorithms have been proposed and achieved performance requirements in industrial production.However,text recognition is mostly applied in video scenario in industry.It's more meaningful to study how to extract text from successive frames.Text eatraction process throuy these steps described blow.First,the text area is detected on a single frame image.Secend,trakes the morement of text between frames.Third,the text area in each track is recognized one by one.Some methods improve recognition rate through multi-frame fusion.Authough this method have proved effective in practice,there are still several shortcomings: First,text detection based on single frame does not make full use of the spatiotemporal characteristics of video text,and multi-frame image fusion is based on key frame selection.And the fusion of low resolution images will result in blurred images;second,text recognition modules usually use RNN to construct language models,but the RNN specific structure limits the efficiency of the model and results in the identification of text regions.This thesis strives to study these three problems,and designs a video text fast reading framework,including: text detection,text tracking and text recognition.the main research contents are as follows:1.A video text detection model based on multi-frame fusion is proposed.The text on the video is invariant with respect to the background for a period of time.If the independent processing of each frame may lose the connection between frames.In this thesis,the adjacent n frames are selected.After the feature network model,the attention mechanism network is automatically selected and merged to extract more expressive features.2.A text region classification algorithm is designed.It is integrated with the text recognition model through multi-task mode,and the weakly supervised learning is performed by using the loss of text recognition,thus avoiding label labeling.For tracking the generated text stream,each text area is not identified,but the best quality is selected for recognition,thereby improving the recognition accuracy and reducing the computational cost.The research of sequence-based text recognition algorithm is one of the key points of this thesis.3.A full-convolution text recognition algorithm is designed to ensure a small loss of precision and improve recognition speed.The introduction of spatial attention and channel attention modules in the network in the feature encoder enhances the network's attention to foreground text and suppresses background noise.
Keywords/Search Tags:video text detection, video text recognition, video text tracking, attention mechanism
PDF Full Text Request
Related items