Font Size: a A A

Research On Text Segmentation In Digital Video

Posted on:2006-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F XuFull Text:PDF
GTID:1118360155453733Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Information is becoming increasingly enriched by multimedia components. Libraries that were originally pure text are continuously adding images, videos, and audio clips to their repositories, and large digital image and video libraries are emerging as well. They all need an automatic means to efficiently index and retrieve multimedia components. Most of the information available today is either on paper or in the form of still photographs and videos. The rapid growth of video data leads to an urgent demand for efficient and true content-based browsing and retrieving systems. To construct such systems, both low-level features such as object shape, region intensity, color, texture, motion descriptors, audio measurements, and high-level techniques such as human face detection, speaker identification, and character recognition have been studied for indexing and retrieving image and video information in recent years. Among these techniques, video caption based methods have attracted particular attention due to the rich content information contained in caption text. Caption text routinely provides such valuable indexing information as scene locations, speaker names, program introductions, sports scores, special announcements, dates and time. Compared to other video features, information in caption text is highly compact and structured, thus is more suitable for efficient video indexing. Text detection and recognition in videos can help a lot in video content analysis and understanding, since text can provide concise and direct description of the stories presented in the videos. In digital news videos, the superimposed captions usually present the involved person's name and the summary of the news event. Hence, the recognized text can become a part of index in a video retrieval system. Systems that automatically extract and recognize text from images with general backgrounds are also useful in many situations, for examaple: text found in images or videos can be used to annotate and index those materials. For example, video sequences of events such as a basketball game can be annotated and indexed by extracting a player's number, name and the name of the team that appear on the player's uniform. In contrast, image indexing based on image content such as the shape of an object is difficult and computationally expensive to do. Systems that automatically register stock certificates and other financial documents by reading specific text information in the documents are in demand. This is because manual registration of the large volume of documents generated by daily trading requires tremendous manpower. Crrent OCR technology is largely restricted to finding text printed against clean backgrounds and cannot handle text printed against shaded or textured backgrounds and or embedded in images. More sophisticated text reading systems usually employ document analysis (page segmentation) schemes to identify text regions before applying OCR, so that the OCR engine does not spend time trying to interpret non-text items. However, most such schemes require clean binary input; some assume specific document layouts such as newspapers and technical journals; others utilize domain specific knowledge such as mail address blocks or configurations of chess games. However, extracting captions embedded in video frames is not a trivial task. In...
Keywords/Search Tags:video text segmentation, shot segmentation, text tracking, text enhancement, car license plate recognition
PDF Full Text Request
Related items