Inter-Frame Data Association Based Method For Text Tracking

Posted on:2023-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2568307172457454

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of video acquisition,storage and processing technologies,video data has grown significantly.Video data contains a large number of text targets,and text as the carrier of human information,carries rich semantic information.Detecting and tracking scene text in video has become a key step in many applications,such as video retrieval,video content understanding and autonomous driving,etc.Therefore,the video text tracking task has broad research prospects and application scenarios.However,existing algorithms still face many challenges due to the diversity of video scenes,strong illumination changes,and motion blur.This dissertation takes the Intersection over Union distance and feature vector distance of text targets as the starting point,fully exploits the potential correlation between video data frames,and proposes a highperformance text tracking algorithm based on inter-frame data association.The main work content and innovation points of this dissertation are summarized as follows:1)A text tracking algorithm based on inter-frame spatio-temporal complementary location is proposed.Scenes such as motion blur and illumination changes in the video data increase the difficulty of locating text targets and lead to the break of text trajectories.Aiming at the above problem,this dissertation fully mines the correlation of video data in the temporal dimension and proposes a Siamese Complementary Network.The network utilizes the position information of the text target in the previous frame to locate the target in the current frame,and fuses it with the predicted position probability map of the text detector to obtain the text target bounding box of the current frame.Compared with the baseline algorithm,the MOTA index is improved by 1.9% and 14.82% on the Minetto and ICDAR 2015 Video datasets respectively,which effectively improves the break of text trajectories.2)A text tracking algorithm based on feature metric learning between frames is proposed.The similarity in visual features of text targets brings ambiguity to the matching process,resulting in the switch of track IDs.Aiming at the above problem,this dissertation designs a Text Similarity Learning Network to encode the unique semantic information of the text,and adopts metric learning to constrain the text target feature distance between frames to output the discriminative text target features.The IDF1 index improves by 17.9%and 27.18% on the Minetto and ICDAR 2015 Video datasets respectively compared with the baseline algorithm,which effectively improves the track ID Switch problem.Combining the above two complementary improvement methods,this dissertation proposes a text tracking algorithm based on inter-frame data association,which achieves the best performance of existing detection-based text tracking algorithms on both Minetto and ICDAR 2015 Video datasets.

Keywords/Search Tags:

Spatio-temporal complementary, Metric learning, Video text tracking, Video text detection

PDF Full Text Request

Related items

1	Research On Text And Specific Object Detection Algorithm In Images And Videos
2	Research On The Technology Of Video Text Information Extraction
3	Research On The Generation Technology Of Academic Video Summarization Based On Spatio-temporal Subtitles
4	Research On Video Text Extraction And The Application In Virtual Karaoke
5	Research And Implementation Of Text Recognition In Video
6	Research On Video OCR
7	Action Recognition Algorithm Based On Spatio-Temporal Scene Graph And Its Application Research
8	Text Extraction In Video
9	Research On Surveillance Video Synopsis Based On Spatio-Temporal Slice
10	Study On Video Text Detection Based On Temporal Information