Font Size: a A A

Research On Violent Video Detection Algorithm Based On Bag Of Audio Words And MPEG-7 Features

Posted on:2011-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:R J LiFull Text:PDF
GTID:2178330338984189Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the flourishing of the moving industry and development of the multimedia, many types of movies are available through the internet. We can easily differentiate among different genres of movies after watching them. However, for the computer, it is a quite complicate work to automatically recognize the theme of various types of the movies. Recent years, more and more attention is paid on the computer vision research area. The computer can make difference of the types of the video by compare the binary data of the video and audio features. The traditional content based video classification mainly includes two parts: the audio features and video features. The visual features include the color, texture and motion while the audio features mainly include the low level features such as audio bandwidth, frequency and Mell feature.On the other hand, there are some films with many violent and horror scenes which are uncomfortable for children to watch. Nowadays, the government pays more attention on the video regulation on the network. For this reason, two methods of classy the violent videos are presented in this paper.We first introduced a new method to identify the violent videos by the bag of audio words is introduced. The MPEG-7 audio descriptors are firstly extracted, including the low level features such as AudioSpectrumCentroid and AudioSpectrum-Spread, etc. The audio words are then built according to the MPEG-7 high level descriptor, the AudioSighnature, which is considered as the―fingerprint‖of the audio stream. The support vector machine is used to classify the feature vectors into two classes, i.e. the violent and non-violent videos. The experiment results demonstrate that our method can achieve good recall accuracy.Combined with the video features, two filtering models are introduced later, which are the visual structure tensor filtering model and fast audio filtering model. In the structure tensor model, we first extract the structure tensor features and then classify the candidate shots by face detection and violent audio event detection. While in the fast audio model, we extract the audio features first and classify the candidate shots by visual features. The experiment results show the visual structure tensor model shows high classification accuracy while the audio model performs higher speed. Both of the models can be applied in the violent vide filtering on the internet.
Keywords/Search Tags:video classification, bag of words, filtering model, support vector machine, MPEG-7 audio feature
PDF Full Text Request
Related items