Font Size: a A A

Research On Short Video Annotation Based On Shot And Scene Context

Posted on:2017-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:T L PengFull Text:PDF
GTID:1108330488492577Subject:Digital media technology and applications
Abstract/Summary:PDF Full Text Request
With the rapid development of digital media technology, communication technology and network technology, the quantity of the multimedia data including video data is increasing by leaps and bounds. Short video is a kind of complex video data, how to search for useful information in massive video data has always been a question for users. To answer the question, some applications as video indexing, video retrieval system, etc are put forward. Video annotation is the key step in making these applications. Nowadays, video annotation has become a hot topic in digital meadia application and computer vision.From the perspective of semantic analysis, video can be divided into several semantic units. Different semantic units have different semantic meanings and semantic annotation can be realized on each semantic unit. In this paper, On the basis of in-depth analysis of video structure and video segmentation, different semantic units come into being and video annotation is finished on each semantic unit. The research work of this paper can be summarized as below:(1) According to the global features and local features of the video frame, a new shot detection method combined with dynamic video texture and SIFT features is proposed in this paper. Firstly, blocks of two adjacent frames are uniformed, and under the RGB color space, the average gradient of each image block is calculated. Then compare the average gradient of all image blocks form video dynamic texture and judge the variation of the shots according to SIFT features of the adjacent frames. The next thing is to compare adjacent frames, dynamic texture. Finally the algorithm determines the change of the lens by combining with matching.(2) In this paper, a video semantic annotation model based on shot event is proposed. Based on the analysis of the video structure, the algotithm extracts the moving object in the scene and the background color feature of the key frames to express a shot event, and further extends to the scene expression by the method. Finally the theme of the video clip is expressed by the set of shot events in combination of contextual shot movements and surrounding backgrounds. This model can best represent the semantic meanings of shots and improve the accuracy of video sematics.(3) A new video annotation method based on semi-supervised clustering is proposed in this paper. From the perspective of the event driven, shot event is looked on as semantic unit and event group is used to annotate micro video. Next a novel annotation algorithm based on semi-supervised K-means clustering algorithm is proposed. In the clustering algorithm the objective function is optimized for getting the better results with the low coupling between clusters, high polymerization in cluster, and reflecting the local data distribution density in cluster. In this paper, the proposed clustering method can also realize clustering on heterogeneous data, such as video and improve the accuracy of video annotation.(4) A new video classification method based on contextual Multi Kernel learning is proposed in this paper. On the basis of traditional bag of word model, according to the spatial and semantic similarity between the key frames of adjacent lens, this paper brings out a new video scene classification model. Firstly, it divides video clips into many shots and extracts their key frames and make the key frames a gauge. The next thing is that the key frames as an image block produce an image on time sequence, which is extracted from SIFT features and HSV features, this paper embeds the SIFT features and HSV feature data into Hilbert space. Through Multi Kernel learning, the algorithm selects the appropriate kernel functions to training of each image, and gets the classification model at last with desirable classification effects.All those tehniques can be widely used in the domain of video classification, video index,video retrieval,video understanding and video management. Widely applications make the research significative and valuable.
Keywords/Search Tags:Dynamic Texture, SIFT Feature, Shot Detection, Video Event, Video Annotation, Context, Semi-supervised Learning, Multi-kernel Learning
PDF Full Text Request
Related items