Research And Implementation Of Text Recognition In Video

Posted on:2020-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:J L Tang

Full Text:PDF

GTID:2428330596475075

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet,video and image have become the main data resources in life.The text information has high-level semantic information,which is the most direct explanation of video content,can be used for video retrieval,classification and understanding.Image text recognition has attracted a lot of attention in computer vision,so many considerable algorithms have been proposed and achieved performance requirements in industrial production.However,text recognition is mostly applied in video scenario in industry.It's more meaningful to study how to extract text from successive frames.Text eatraction process throuy these steps described blow.First,the text area is detected on a single frame image.Secend,trakes the morement of text between frames.Third,the text area in each track is recognized one by one.Some methods improve recognition rate through multi-frame fusion.Authough this method have proved effective in practice,there are still several shortcomings: First,text detection based on single frame does not make full use of the spatiotemporal characteristics of video text,and multi-frame image fusion is based on key frame selection.And the fusion of low resolution images will result in blurred images;second,text recognition modules usually use RNN to construct language models,but the RNN specific structure limits the efficiency of the model and results in the identification of text regions.This thesis strives to study these three problems,and designs a video text fast reading framework,including: text detection,text tracking and text recognition.the main research contents are as follows:1.A video text detection model based on multi-frame fusion is proposed.The text on the video is invariant with respect to the background for a period of time.If the independent processing of each frame may lose the connection between frames.In this thesis,the adjacent n frames are selected.After the feature network model,the attention mechanism network is automatically selected and merged to extract more expressive features.2.A text region classification algorithm is designed.It is integrated with the text recognition model through multi-task mode,and the weakly supervised learning is performed by using the loss of text recognition,thus avoiding label labeling.For tracking the generated text stream,each text area is not identified,but the best quality is selected for recognition,thereby improving the recognition accuracy and reducing the computational cost.The research of sequence-based text recognition algorithm is one of the key points of this thesis.3.A full-convolution text recognition algorithm is designed to ensure a small loss of precision and improve recognition speed.The introduction of spatial attention and channel attention modules in the network in the feature encoder enhances the network's attention to foreground text and suppresses background noise.

Keywords/Search Tags:

video text detection, video text recognition, video text tracking, attention mechanism

PDF Full Text Request

Related items

1	Research On Video Text Extraction And The Application In Virtual Karaoke
2	Research On The Technology Of Video Text Information Extraction
3	Research On Video OCR
4	Research On Text And Specific Object Detection Algorithm In Images And Videos
5	Text Extraction In Video
6	The Research On The Method Of High-Definition Video Text Extraction And Recognition
7	Research On Video Text Recognition Of Natural Scene Based On Image Stitching Technology
8	Inter-Frame Data Association Based Method For Text Tracking
9	Reasearch On Video Text Information Extraction Based On Features Integration
10	Research On Text Detection In Images And Video Frames