Font Size: a A A

Reasearch On Video Structure Analysis And Automatic Cataloging Technique

Posted on:2014-01-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y GaoFull Text:PDF
GTID:1228330401463070Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of multimedia technology and computer processing ability, people are faced with a huge digital information ocean. However, in the same time, with the richness and diversity of these contents as well as characteristics of the high-dimensional data structure in temporal and spatial space, people start thinking about how to effectively organize these massive data, and how to find their own interesting contents as fast as possible. Meanwhile, the movie and TV series on broadcast television and networks have already become an important part of people’s daily life. Thus, in order to manage, organize, browse and index these videos more effectively, most of the processing, including multimedia structure analysis, semantics extraction as well as cataloging need the human assistances. Therefore, with so many movies and television resources to be proceed, a lot of manpower and material resources will be wasted for achieving the destination of video segmentation and semantic annotation. Besides, with the exponential growth of movies and television resources, such manual ways begin to be unable to meet the needs of resources producers as well as users. Well, that is the motivation for us to introduce several video processing methods, including the structure analysis, understand and automatically cataloging for movies and television series. Using these methods, it will undoubtedly save a lot of money, time and other resources, and also improve the efficiency of the production for broadcast program.In this paper, we do researches on video structure analysis and automatic cataloging. In order to handle the large-scale video data, video structural analysis is very useful and important, which is used to segment the video into several independent logical units at first. Then, through a series of semantic content analysis techniques, some important semantic information is obtained for automatic video parsing and cataloging. More specifically, to deal with these problems in video structure analysis and video cataloging, we proposed a series of methods, including the shot boundary detection, the video scene detection, the video scene recognition as well as the movie character identification method. Actually, with these methods, we mainly deal with the following two questions:1) how to efficiently do video structure analysis in such a complex video environment;2) how to accurately extract the basic semantic contents for automatic video cataloging. The main contribution of this paper is as follows:(1) For video structure analysis, a fast and efficient shot boundary detection algorithm is necessary, especially for real-time video processing applications. Extensive work has focused on accurate shot boundary detection at the expense of demanding computational costs. Therefore, we proposed a fast shot boundary detection method with the Focus Region (FR) definition and the adaptive skipping searching. Our method reduces the computation pixel-wise and frame-wise while still giving satisfactory accuracy. The proposed approach substantially speeds up the computation through reducing both detection region and scope. Color histogram and mutual information are used together to measure the difference between frames, and corner distribution of frames is utilized to exclude most of false boundaries.(2) Scene detection is the fundamental step for efficient accessing and browsing movies and TV series. Therefore, firstly, we propose to segment movie into scenes which utilizes fused visual and audio features:1) the movie is first segmented into shots and the key frames are extracted later;2) while feature movies are often filmed in open and dynamic environments using moving cameras and have continuously changing contents, we focus on the association extraction of visual and audio features;3) based on the Kernel Canonical Correlation Analysis (KCCA), all these features are fused for scene detection;4) spatial-temporal coherent shots construct the similarity graph which is partitioned to generate the scene boundaries. Secondly, while many existed scene recognition methods, which refers to the problem of recognizing the semantic scene labels (e.g. bedroom, street), focus on static images and cannot get satisfactory results on videos, we propose a robust movie scene recognition approach utilizing panoramic frame and representative feature patches, and also the correlations between video clips are used to enhance the final recognition performance.(3) Automatically identifying characters in movies has attracted researchers’interest, and led to several significant and interesting applications. However, due to the vast variation in character appearance as well as the weakness and ambiguity of available annotation, it is still a challenging problem. In this paper, we investigate this problem with the supervision of actor character name correspondence provided by the movie cast. Our proposed framework, namely Cast2Face, is featured by:(i) we restrict the assigned names within the set of character names in the cast;(ii) for each character, by using the corresponding actor and movie name as key words, we retrieve from Google image search and get a group of face images to form the gallery set;(iii) the probe face tracks in the movie are then identified as one of the actors by a robust kernel multi-task joint sparse representation and classification method; and (iv) the Conditional Random Field (CRF) model with consideration of the constraints between face tracks is introduced to enhance the final labeling. Finally, the assigned actor name of a face track is then mapped to the character name based on the cast again.(4) Finally, in order to verify the effectiveness of these proposed methods, we design an automatic video cataloging system based on the information of video structure analysis and semantic content extraction. Through a large number of experiments, it shows that the proposed methods can accurately and efficiently deal with problems in video structure and content analysis, and also provide more intelligent cataloging contents. These methods as well as the system really provide sufficient assistances for broadcasting producer and also the end users.
Keywords/Search Tags:video structure analysis, shot boundary detection, scene detection, video scene recognition, face recognition and character identification
PDF Full Text Request
Related items