Font Size: a A A

Video Low-level Features And Middle-level Semantic Representation Research

Posted on:2017-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:C H JiangFull Text:PDF
GTID:2358330503481869Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, and the growing popularity of mobile intelligent machines, cameras and other equipment, the amount of video data on the Internet was explosively growth. Video classification, retrieval and management have become a hot spot, where video content analysis technology is a prerequisite process of these applications.Currently, there are several problems of state-of-the-art video content analysis methods as follows. Firstly, there are a lots of redundant information of low-level features extracted from video, where current approaches do not effectively compress the redundancy information and retain the basic characteristics(robustness and distinguished) of features.Secondly, there is a semantic gap between low-level features and high-level semantic, video which consists of emotional, behavioral and video scenes information. Thirdly, there is a problem of video content classification modeling. For the huge video content features, current approaches do not effectively model the video features for video classification or retrieval applications.In order to solve these problems, this paper mainly studies the low-level features and middle semantic representation of videos, as well as models video middle semantic representation for video scene classification. The works in this thesis are as follows:(1) In the stage of study the video low-level features, due to the fact that current video key frame extraction algorithm are not universal for massive video data, we extract several video key frames every one second. Then, extract lower dimensional SIFT(S-SIFT) features from the key frame, and integrate the local density clustering method to get the patch of S-SIFT(PSIFT) which can compress the redundant information on video spatial domain. Due to a large number of similar adjacent information between video frames, we use object tracking method to track the PSIFT features on every video key frame, for compressing the redundant information again on video time domain. Experimental results show that ourtemporal compression of PSIFT(TCPSIFT) not only effective reduces the video's redundant information but also maintains a basic power of robustness and distinguishes.(2) In the stage of study the video middle semantic representation, by analyzing the shortcomings of existing image middle semantic extraction algorithm, we combine the existing algorithms with significant image context-aware detection on the stage of choosing the initial patches, then calculate every patch's weight on the overall image from the saliency map, in order to choose the patches with higher weight for the next stage of clustering and classification. Eventually, we get the patches which contain more semantic information.(3) Combined with video middle semantic representation, we proposed a model on video middle semantic representation for video scene classification based on bag of words model,and use local density clustering method to overcome the traditional text classification model based on the K-means method which brings the shortcomings of uncertainty initial cluster centers and iteration until convergence.
Keywords/Search Tags:Low-level features, Feature extraction, Middle-level semantic representation, Scene classification
PDF Full Text Request
Related items