Font Size: a A A

Research And Implementation On Context-based Video Multi-Semantic Annotation

Posted on:2017-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:B H YuFull Text:PDF
GTID:2428330590988896Subject:Software engineering
Abstract/Summary:PDF Full Text Request
How to efficient retrieve and manage the vast amounts of video data has become a challenge in the field of multimedia technology.The semantic feature of the video is the most close to the user's understanding of the form,and the video annotation based on machine learning can effectively and quickly get the semantic features.However,due to the fact that video content is often complex and rich,and the complexity of the video semantics,fuzziness and subjectivity makes the "semantic gap" between low-level features and high-level semantics,resulting in the existing annotation method is difficult to achieve satisfactory results.However,there is a certain correlation between the semantic content of video rich content data,and there are context correlation in time,so it will be helpful to reduce the "semantic gap" of the video,so as to improve the accuracy of video annotation by making full use of the two characteristics.In this paper,In order to solve the existing semantic annotation problem of duplication of labor,low efficiency and poor results,we proposed a video annotation method based on the context of time and semantic of video data and optimization the results with the fuzzy graph.The main research work can be summarized as follows:First,A multi-semantic automatic annotation method based on video content context is proposed.Our method is based on video hierarchy information and semantic information.From the aspect of hierarchy,constructs a tree model using video,feature frames and regions from top to bottom.From the aspect of semantic,translate regions into corresponding labels using the trained classification model.Second,An improved feature frame extraction method based on the existing clustering and compressed domain was proposed.Using the I frames in compressed domain for clustering and compute the cluster center according to the similarity between frames.Then extract the cluster center to get the feature frames.And get contribution to the video for each frames on the basis of cluster size.Finally,experiments show that our method received a good performance.Third,For the extracted feature frames,divided them into several regions.And then translate the regions to text labels using Convolutional Neural Networks.An image may contain more than one object,so the regional division according to the actual situation.This paper proposes an improved region division method.And then use the ImageNet2014 data set to train the Caffe framework,to get the image classification model;the area of the object concept is obtained by Caffe.Then,according to the vertical structure of the video,built the video tree model,and obtained the preliminary results of video annotation.Finally,Analysis the correlation between the video semantic concepts.For the video data in scenic area,statistics the object concept frequency in video and the relationship between them.And then construct semantic fuzzy graph using the relationships of concept.The annotation results were optimization with this fuzzy graph.Our method can recognition multiple objects contained in the video.It is good to reflect the richness of video content.And has solved the deficiency exists in the traditional method that is only classified as one type.In addition,the three methods proposed in this paper have been tested by the corresponding data sets,and compared with the existing method,demonstrated the good performance of these methods.
Keywords/Search Tags:Video, Multi-Semantic Annotation, Feature Frame, Neural Network, Video Tree
PDF Full Text Request
Related items