It is widely used in automated surveillance, video indexing, human-computer interaction, traffic monitoring and video detection and recognition. Most existing methods are based on detection. However, the fact that it regards detecting and tracking as two separated processes and the tracking information are not used in detection results in many false and missing detections and involves heavy computational complexity. Specially, this dissertation research the application of multi-object tracking by regarding complex video text(embedded caption text and scene text) as the object to track. Traditional text detection method focus on detection text on the single image. But video text has the temporal redundancy. Multi-object tracking can be utilized to improve the accuracy of the text detection. Moreover, the motion of scene text is complex. Single tracking algorithm cannot handle this issue effectively.Aiming at dealing with false detection, miss detection and high computational burdens, this dissertation propose Multi-object Tracking with inter-feedback between Detection and Tracking (MTDT) algorithm, which utilizes the feedback from tracking to detection for reducing false detection and miss detection, and improving the efficiency of the detection and tracking algorithm. Aiming at utilizing temporal redundancy and dealing with complex motion model of scene text, based on the proposed multi-object tracking, combining with tracking based text detection and multi-strategy prediction algorithm, this dissertation proposed a MTDT based embedded caption text detection and tracking method, and a MTDT based scene text detection and tracking method respectively. The specific contents and contributions of this paper are as follows:(1) Propose a multi-object tracking algorithm, by introducing on-line inter-feedback information between the detection and tracking processes into the tracking-by-detection method. The tracking algorithm consists of two iterative components:detection by feedback from tracking and Tracking based on detection. In the tracking step, we use group tracking strategy based on detection. Moreover, in order to handle tracking scenarios with different complexity, objects are classified into two categories, i.e. single object and multiple ones, and are dealt with different strategies. In the detection step, objects are detected by the detectors adjusted by information from tracking. The object type help to use less detectors. The scale and predicted location make the detector detect in the smaller region to reduce false detection and computational complexity. The proposed algorithm is evaluated on several real surveillance videos and public datasets. The experimental results show the performance and efficiency of proposed algorithm.(2) Propose a MTDT algorithm based video embedded caption text detection and tracking method. MTDT algorithm is modified to adjust to embedded caption text detection for the sake of improving text detection precision by taking full use of temporal redundancy. Color feature, motion model and contour feature are used to calculate similarity between detection and trajectory. Then, Hungarian algorithm is utilized to solve data association. Thereafter, for getting higher recall and making the detection location more precise, we propose a tracking based text detection method which include verification of false detection, prediction position based text detection and correction of the text rectangle’s size. Moreover, a challenging video text (embedded caption text) database (USTB-VidTEXT) is constructed. A variety of experiments on this dataset verify that our proposed approach not only outperforms the state-of-the-art method, but also have good expansibility and real time performance.(3) Propose a MTDT algorithm based video scene text detection and tracking method. Based on embedded caption text detection and tracking, we improve the prediction method of text location in the next frame. Aiming at dealing with complex motion model of scene text, we propose a rule based multi-strategy prediction algorithm, linear prediction, STC algorithm and SURF+RANSAC algorithm are utilized to predict the text location. And the predicted location are combined based on the rule to get the best result. The algorithm deal with different type text by different method, improving the precision of text tracking efficiently. The experiments on ICDAR2015 dataset verify that the method gets good performance on all three metrics (MOTP, MOTA and ATA).(4) Combining the proposed text detection and tracking method, we design a complex video text detection and recognition system. In text detection, proposed MTDT based text detection and tracking algorithms are used to locate the text. In text recognition, we propose a tracking based text recognition method, the inter-feedback between tracking and recognition is used to reduce the negative effects brought by error of text tracking. First, recognize text on each single images. Second, over-segment the trajectory gotten by tracking in the temporal field to ensure all the objects belong to one trajectory are the identical text. Then, an agglomerative hierarchical cluster are designed and applied to get refined text trajectories. At last, a voting strategy is utilized to get the final recognition results.The experiments on USTB-VidTKX and ICDAR2015 dataset shows the effect of our method. |