Font Size: a A A

Research On The Technology Of Video Text Information Extraction

Posted on:2013-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J LiFull Text:PDF
GTID:1228330377959262Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Video is a kind of media data integrated with multiple modal information such as image,text, sound, and with a characteristic of a large volume of data and information-rich. With thedevelopment of computing technology, multimedia technology and network technology, thevideo data expands rapidly, and the traditional video content analysis technology based onmanual annotation, which has been unable to meet the need of the management and retrievalof huge amounts of video data. With the wish to implement the automatic extraction of videocontent by computer, content-based video retrieval technology appeared. However, video fileswith an unstructured data organization mode stores the low-level information of target projectlike color, brightness and localization in the form of pixels, lacking an intuitive description ofthe high-level semantic information, being with a huge amount of data and diverse content.Therefore the research of how to automatically extract the high-level semantic content fromvideo data by computers has become a hot spot of the area of automatic and intelligentmanagement and retrieval of video data. The text in the video is not only highly relevant withthe video content, providing important clues for the automatic understanding of video content,but also much easier to be extracted than the other information, therefore the achievement ofautomatic extraction and identification of text information in video is of great significance forthe research of video data retrieval based on content.However, texts embedded in complex background usually have different language, fontand color even in the same video, which makes it complicated to extract the text from video.The dissertation focuses on the research of the crucial technology of video text extraction,including the detection and localization of text region in video image, the tracking of the sametext area of continuous multi-frame appearance and segmentation of text characters.Aiming at detection and localization of text of videos, a video text detection andlocalization method combining Wavelet feature with Local Binary Pattern feature is proposed.Firstly, the detection of candidate text area is accomplished according to edge density andcorner density. Next, text objects are described combining Wavelet feature with Local BinaryPattern feature with the dimension of characteristic reduced by Isometric mapping based onmanifold learning. Finally, the accurate classification of text area is implemented with Support Vector Machine, and the accurate localization of single text line is complied based ongradient density map. The algorithm accomplishes the detection and localization of text areaof video with multi-characteristic and multi-procedures.In order to improving the efficiency of detection of text area of video, a video text areatracking algorithm based on template matching is proposed. Video text tracking isaccomplished with the Normalized Cross-correlation measure based on relevance as templatematching measure standard and the edge image obtained after Wavelet reconstruction of textarea image as matching template, and the hierarchical matching is applied with Pyramidmatching strategy. The rapid and efficient text area tracking obtained by effective using ofcharacteristic of temporal redundancy can avoid executing detection and localization of textarea during every frame and accelerate the rate of extraction of text information of entirevideo.The text in video is usually in a complex background. A video text segmentationalgorithm based on combining multi-frame is proposed. Firstly, the image with a simplebackground of the same text image sequence is chose to be combined, and the negativepolarity text image fitting for OCR is obtained after polarity judgment. Aiming at thecharacteristics of the structural diversity of character strokes, the traditional two-dimensionalmaximum conditional entropy is improved on comprehensive consideration of gray-scalefeature and edge feature. Two-dimensional maximum conditional entropy based onNon-subsampled Contourlet Transform is considered as fitness evaluation function. And thebest threshold of image segmentation is calculated making use of the effective globaloptimization ability of Bacterial Foraging Optimization. The algorithm can effectivelydecrease the effect of complex background to the text segmentation, and improve theaccuracy of segmentation threshold and the recognition rate of video text.Besides, a video text segmentation method based on Pulse Coupled Neural Network isproposed with excellent characteristics of PCNN in image segmentation fully studied. Aimingat video text segmentation, the parameters and output standard of simplified PCNN networkmodel have been improved. Different with the traditional threshold segmentation method, themethod based on PCNN can effectively decrease the difference of adjacent pixels with thesimilar gray values during the text segmentation. The segmentation introduced in thisdissertation is effective and feasible, and have strong robustness with complex background.
Keywords/Search Tags:Video text, Text Detection and Localization, Text Tracking, Text Segmentation
PDF Full Text Request
Related items