Font Size: a A A

Research On Video Text Extraction And The Application In Virtual Karaoke

Posted on:2012-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178330335462895Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Caption in video provides rich information for video content, therefore the technology of caption extraction is very important for image understanding and content-based information retrieval systems. If these text occurrences could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieval. Text in video is different from ordinary document, which can be recognized by OCR directly. The size, shape and color of text in video various respectively, in addition, text always imbedded in complicated background. These factors make extraction more challenge. How to use the characteristic of caption in video on the basis of existing researches to get better text extraction, becomes the focus in this thesis.Captions in videos often span tens or even hundreds of frames and many researchers have exploited the temporal redundancy of video caption to improve the caption detection accuracy and the caption region quality. This thesis presents a new method of monitoring and tracking. First caption regions are obtained by edge detection, and then the bit map of caption regions are used to track static caption object in frames as signature. This method can refine the appearance of the text regions, avoid detection and recognition on each frame and reduce the cost of computation. For text segmentation, multi-frame integration and image interpolation are exploited to enhance text regions by removing complex background.A new method which utilizes many features of caption such as temporal and spatial, edges, color is proposed to extract caption which maybe has many colors in digital video. First, the caption's regions are located by detecting edges so text's colors are known. Then, universal Gaussian Mixture model (GMM) is trained for text's color. Last, the color layer of texts is extracted based on the trained GMM. The method to judge whether content of caption changes is to add mask bitmap to frame. Experiments show that this method performs well even if the background is complex and the color of text is not single.A virtual karaoke is designed which combine extraction of caption and human segmentation. A method based on wavelet transform and morphology is proposed in the caption detection in karaoke. Harr wavelets are used to decompose the frame and then open and close are used to remove noise. Sub-image in skew high frequency is used locate text. This method is not sensitive to color, so it can locate caption effectively. Background subtraction based on Gaussian model is used to detect human. Blurring can be added to final result.Experimental results show that the proposed methods perform well in the detection and recognition of different types of words.
Keywords/Search Tags:video retrieval, text detection, video text location, text segmentation, text recognition, background modeling
PDF Full Text Request
Related items