Research On Double Modal Video Content Recognition Based On Spatio-temporal Features And Bag Of Words Model

Posted on:2012-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:B Feng

Full Text:PDF

GTID:2178330338984185

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Recognition and detection of video content is a popular topic in computer vision in the recent decade. More and more applications involve the technique of automatic recognition and detection in video content with the increasing demand of video supervision and the development of Internet. Taking advantage of this technique, it can automatically analyze the motion patterns of the objects such as human bodies in videos as well as detecting the harmful content such as violence or eroticism in videos. Also this technique will take great effects in applications like man-muchine interaction and video retrieve.This paper will focus on the recognition of human actions and the detection of the ill contents in videos. Considering the differences between these two applications, this paper will utilize different techniques. They are the extraction and analysis of the spatio-temporal features, the fusion of large amount of features both in video and audio modal, the generation of bag of words vectors for video and audio modal, the SVM classification framework with a second round prediction, the hierarchical detection architecture for ill content in videos and so on.For human actions recogni, a novel space-time surf descriptor and its application to human action recognition by combining with a bag of video words approach are presented. The new descriptor can better represent the spatio-temporal nature of the video data in the application of action recognition. A bag of words approach is used to represent videos, and a soft weighting strategy is exploited. Experiment will be done in the KTH's action recognition dataset. In experiment a voting system containing second pass prediction will be employed in classifying actions as well as the traditional classification framework .Results of experiment show how this approach is able to outperform the previously proposed schema both in speed and accuracy, while the new voting schema works better than the traditional one in some actions.For ill content detection, this paper proposes two nichetargeting detecting processes according to the natural characteristics of violence and eroticism. A modified structure tensor histogram and some simple color features are exploited to work with the audio BOW vectors for hierarchical detection to shots contain violence. For eroticism detection , great amounts of MPEG7 visual descriptors is fused to work on the ROI in key frames while audio BOW vectors are used to work on the shots contain these key frames. The experiment demonstrates the good performances both in accuracy and efficiency.

Keywords/Search Tags:

video content recognition and detection, visual–audio modal, spatio-temporal feature, bag of words model

PDF Full Text Request

Related items

1	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
2	Research On Surveillance Video Synopsis Based On Spatio-Temporal Slice
3	Dynamic Gesture Recognition Based On Spatio-temporal Feature Representation And Dictionary Optimization
4	Research On Methods Of Video Content Analysis Based On Spatio-temporal Variation
5	Research On Spatio-temporal Correlation Feature Extraction And Recognition Of Multi-modal Tactile Signals
6	Human Action Recognition Based On Spatio-temporal Interest Points
7	Research On Violent Video Detection Algorithm Based On Bag Of Audio Words And Mpeg-7 Features
8	Research On Violent Video Detection Algorithm Based On Bag Of Audio Words And MPEG-7 Features
9	Analysis of video content using statistical spatio-temporal models
10	Content Based Robust Video Fingerprinting