Font Size: a A A

Research On The Key Technologies Of Semantic Based Video Browsing System

Posted on:2008-08-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M QianFull Text:PDF
GTID:1118330338977049Subject:Liu Guizhong
Abstract/Summary:PDF Full Text Request
With the development of information science and techonology, video resources are becoming indispensable parts of people's daily life. When we enjoy the convience of information science, a boring problem appears at the same time, namely, how to search the interested video sections from a huge amount of video databases in the internet. It is a challenging problem for researchers in video analysis, indexing and retrieval. For the flexible video searching task in the management of large scale video database, one effective way is to parse video sequences into semantic shots and to mine various semantic objects, concepts, and causalities among them according to certain rules. Browsing the content of a video sequence effectively like browsing a book is a popular way, where viewers can easily find what they want by consulting the table of content of the video databases. We have accomplised some fundamental work toward semantic based video analysis, indexing, retrieval and browsing, which are summarized as follows:(1) Feature analysis and extraction using the compressed domain related information. Video sequences are usually compressed using some standards, such as MPEG, for reducing the redundancy, which is a fundamental step for video transmission and storage. Using the compressed domain related information for feature analysis and extraction can speed up the process of video content analysis, indexing, retrieval and browsing. We employ the AC coefficients of a block to extract its texture information, and use the DC image to approximately represent the original frame. Moreover, motion vector field information of compressed video is also utilized to calculate the motion related information.(2) Scene change detection, scene change type recognition, and semantic based shot classification. Scene change detection, which is also named as shot boundary detection, is one of the fundamental problems in video analysis, indexing, and retrieval. Moreover, it is the minimum unit used in video edition and production. Editors and movie makers represent certain ideas through the connection of different shots. Starting from the mathematical models of flashlight and fade in/out, their excellent characteristics in accumulating histogram difference are deduced, which could not only help us to detect them effectively but also to recognize their types at the same time. Cut, dissolve, and other types of scene change are detected effectively by removing the well detected flashlight and fade in/out. Semantic based shot classification provides middle level semantic information for video content analysis and retrieval. The motive of a story unit or event clip can be discovered from the semantic shot type information. By integrating the domain related knowledge, video production knowledge and camera motin pattern information, a soccer video is segmented into a set of semantic shots. Moreover, audio data is classified into five categories: silence, pure speech, speech with background noise, speech with music background and pure music, according to the temporal domain and spectral domain related features. The semantic shot classification and audio data classification are fundamental steps toward the detection and classification of the high level semantic events and story units.(3) Text detection, localization, tracking, segmentation, and type classification. We use the AC coefficients of a block in the MPEG compressed domain to represent the texture information of the block, and to carry out fast text detection, localization and tracking for the starting and ending frames for each text. The foreground and background of video texts are integrated in text segmentation, which can reduce the influence of complex backgrounds effectively. The text detection performance of H.264/AVC and MPEG compressed domain information is compared with each other. Detected overlaid texts are classified into rolling text, long-term text, speech content text, and title text according to their motion activities and lifetimes. The text type information is fused in semantic based event and story unit classification, and text-oriented video abstraction.(4) Global motion estimation, camera motion based shot refinement, and global/local motion based applications. An MV group based global motin estimation method and a genetic algorithm based method using the verified motion field of a compressed video are proposed. From the estimated camera motion information, the global shot of soccer video is further classified into several semantic categories. Moreover, a GM/LM based text occluded region recovery and error concealment in video transmission systems are proposed, which improves the recovering results effectively compared to the GM or LM method alone.(5) Semantic based event and story unit detection and classification. The boundaries of an individual event clip and story unit are determined adaptively according to the domain knowledge and production effect before recognizing their types. Semantic shot type information, text type information of an event clip, domain related knowledge, and video production knowledge is integrated for soccer video event detection and classification. Each event clip is classified into one of the five types: shoots, goals, fouls, placed kicks and normal kicks. Moreover, the highlight events are further classified into each team according to the dominant camera motion pattern information. And for a news video, each story unit is classified into one of the 9 categories according to the multi-modal audio-visual clues. The semantic based event and story unit classification results provide a possible way for minimizing the semantic gap in video indexing and retrieval.(6) A unified video indexing, retrieval, and browsing framework is proposed similar to the ToC of a book, based on the event and story unit classification results. It is a heuristic framework, which provides a three layer video abstraction structure. Viewers can browse the video content like reading books, navigate in the video content freely and localize the interested segments effectively.
Keywords/Search Tags:Semantic video analysis, Video abstraction, Video browsing, Video retrieval
PDF Full Text Request
Related items