| Semantic video management, including video browsing, indexing and retrieval, is necessary for the effective utilization of video repositories. Video content analysis technology aims to bridge the semantic gap between low-level features and high-level concepts, and to provide an accessible way to organize and manage video data.In this dissertation, research efforts are concentrated on audio, caption and visual content analysis and multimodality information fusion techniques for news video with pattern recognition models. The three main contributions are as follows:(1) A novel anchorperson shot detection algorithm in MPEG domain is proposed, in which an improved face detection method in compressed domain and a new dissimilarity metric for clustering are presented. The proposed algorithm is effective and computationally efficient.(2) A new video shot classification method is proposed using decision tree. Six semantic types are studied and categorized: Commercial, Others, Still Image, Anchorperson, Reporter and Monologue. The first three types are identified with features of black frame, motion activity, shot duration and face. The anchorperson shots are detected by clustering method. And the reporter and monologue shots are distinguished by conditional random fields (CRFs) model, where the detection is transformed into sequence labeling problem using audio, face, motion and temporal information. The experimental results demonstrate the effectiveness and high performance of the method.(3) A novel news story segmentation method is proposed, fusing multimodality information from the results of audio classification, caption extraction and video shot classification. The video shot sequence is transformed into several keywords sequences so that the news story segmentation is treated as a sequence segmentation problem. CRFs model is employed to fuse the context information within and between the keywords sequences. Experiments show that the idea is feasible and better result is achieved.Besides, various video content analysis techniques are surveyed, a layered audio classification method based on rules and HMM model is developed, a caption extraction framework for news video is designed and realized, and a COM-based video content analysis and abstraction system is devised and implemented in this dissertation.All in all, the dissertation provides an in-depth investigation into semantic concepts detection and multimodality information fusion. |