Font Size: a A A

Research On Video Representationtowards Video Understanding

Posted on:2016-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:S J HouFull Text:PDF
GTID:1108330503452349Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The explosive growth of online multimedia data has brought new challenges to the video data transmission, management and analysis. In the field of computer vision, people try to use the computer instead of human visual sense to process visual information, realizing automatic understanding. Video understanding is a subject which studies computer system to explain and analyze video images, and realize the human visual system to understand the external world. Its main task is to carry out the video segmentation and recognition, to obtain useful information, to associate these information and sematic application environment, and to get the appropriate application. Due to the complexity of video structure and rich semantic information, the research of video understanding has become a important butdifficult topic in video analysis.In the study of various video applications, such as video indexing, annotation and classification, people’s understanding of video similarity is not established by the similarity of low-level visual features between frames, but that of high-level semantic theme, e.g. objects, scene or event. However, this understanding can not be obtained directly from visual features of video frames. We know that the understanding of video for computer is based on the low-level features such as color, texture, shape etc. It is precisely due to the difference towards video understanding between people and computer that causes the ‘semantic gap’. Concise but effective video representation is not only beneficial to the compression of video storage, but also available for efficiently retrieval and management.At present, how to fill the ‘semantic gap’ existing in video understanding, and mine valuable information with special knowledge to promote effective management and video analysis has become an important research subject.We focus on narrowing the ‘semantic gap’existing in video content understanding. Specially, the video representation and its application are studied. Firstly, a weighted video representation based on global feature and its application in video retrieval are studied. Then, a local-feature-based video representation from the perspective of multidomain, multi-view and multi-layer is builtto bridge between video metadata and application of classification. Lastly, the representation paragigm with respect to special type of video and its application are also studied. The following points highlight several contributions of our work:1. A novel weighted representation model for video frame in DCT domain isproposed, based on that, a hierarchical model of similarity measure is presented. By implementing the proposed model on several different kinds of natural videos for Query by Example Video Retrieval(QEVR), the experimental results demonstrate the effectiveness of proposed approach.2. A novel Multi-Layer Multi-View Topic Model(mlmv_LDA) is proposed, which compensates for the deficiencies of global-feature-based video representation and narrows the ‘semantic gap’ existing in video understanding. The proposed mlmv_LDA model not only enriches the visual information, but also fuses latent semantic concept. It could effectively avoid the lopsidedness of single view and support the semantic content understanding of video at the same time. We provide empirical results on 10111 real-word advertising videos, which demonstrate the effectiveness of the proposed model.3. A novel advertising video representation is proposed, which integrates the posterior occurrence probability of both brands information and the high-level object information into a Latent Dirichlet Allocation unified learning paradigm(posterior probability involved in LDA, pp LDA). With regard to some types of specific video, considering that advertising videos are gradually becoming the most popular medium of the business, we focus on the representation of advertising video and classification in this dissertation.4. A novel multi-label learning paradigm based on high-level semantic representation is devised, which is named Directed Probability Label Graph(DPLG). It mainly focuses on the video which contains specific object or tag. In this dissertation, the advertising video is treated as the typical example. The interdependency between labels are considered in DPLG.Experiments on several publicly available datasets and advertising videos demonstrate the effectiveness of the proposed method for multi-label annotation with label relevance.
Keywords/Search Tags:Video Representation, Similarity Measure, Video Retrieval, Advertising Video Classification, Multi-label Learning
PDF Full Text Request
Related items