Font Size: a A A

Research On Double Modal Video Content Recognition Based On Spatio-temporal Features And Bag Of Words Model

Posted on:2012-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:B FengFull Text:PDF
GTID:2178330338984185Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Recognition and detection of video content is a popular topic in computer vision in the recent decade. More and more applications involve the technique of automatic recognition and detection in video content with the increasing demand of video supervision and the development of Internet. Taking advantage of this technique, it can automatically analyze the motion patterns of the objects such as human bodies in videos as well as detecting the harmful content such as violence or eroticism in videos. Also this technique will take great effects in applications like man-muchine interaction and video retrieve.This paper will focus on the recognition of human actions and the detection of the ill contents in videos. Considering the differences between these two applications, this paper will utilize different techniques. They are the extraction and analysis of the spatio-temporal features, the fusion of large amount of features both in video and audio modal, the generation of bag of words vectors for video and audio modal, the SVM classification framework with a second round prediction, the hierarchical detection architecture for ill content in videos and so on.For human actions recogni, a novel space-time surf descriptor and its application to human action recognition by combining with a bag of video words approach are presented. The new descriptor can better represent the spatio-temporal nature of the video data in the application of action recognition. A bag of words approach is used to represent videos, and a soft weighting strategy is exploited. Experiment will be done in the KTH's action recognition dataset. In experiment a voting system containing second pass prediction will be employed in classifying actions as well as the traditional classification framework .Results of experiment show how this approach is able to outperform the previously proposed schema both in speed and accuracy, while the new voting schema works better than the traditional one in some actions.For ill content detection, this paper proposes two nichetargeting detecting processes according to the natural characteristics of violence and eroticism. A modified structure tensor histogram and some simple color features are exploited to work with the audio BOW vectors for hierarchical detection to shots contain violence. For eroticism detection , great amounts of MPEG7 visual descriptors is fused to work on the ROI in key frames while audio BOW vectors are used to work on the shots contain these key frames. The experiment demonstrates the good performances both in accuracy and efficiency.
Keywords/Search Tags:video content recognition and detection, visual–audio modal, spatio-temporal feature, bag of words model
PDF Full Text Request
Related items