Font Size: a A A

Multi-label Video Classification Based On Multi-path Ensemble Network

Posted on:2019-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:J X CaoFull Text:PDF
GTID:2428330566987585Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rise of Tencent Video,Youku and Iqiyi Video and so on,the need for video search,classification and recommendation has become increasingly urgent.At present,most of the automatic classification systems of video are based on information such as video titles,subtitles and so on.Due to technical limitations,the content of video is ignored.In recent years,thanks to the rapid development of deep learning algorithms in the image domain,a great progress has been made in the video classification technology,which makes the automatic video classification based on the content of video possible.This article applies deep learning and its powerful feature extraction capabilities in directly classifying the video with multi–labels according to the video contents.Our innovations and works in this paper are stated as follow:1)We transfer multiple text features extraction algorithms to extract temporal features of video.Both video and document consist of images(i.e.frames)and words and also have key frames and keywords.The difference between text and video is that adjacent frames of a video are similar.Therefore,the algorithms we transfer are optimized,which can extract temporal features of video better.2)The C-LSTM network is improved by combining with the Bi-LSTM and the Attention mechanism so that it can extract the key information of video.Due to the redundancy information of realistic video,there is a lot of useless and noisy information in the video.Only a small part of video may play a key role for classification.By combining the Attention mechanism,the model gives different weights to the video contents to extract more distinguishing features from the video.3)We improve the Temporal Segment Network(TSN)for multi-label classification of video.By proposing a variety of complementary sampling strategies,important information is preserved as much as possible while reducing video redundancy information.Based on the Bagging's idea,we use Mixture of Experts as our new ensemble strategy so that we can achieve the effect of Bagging by sharing the parameters and integrating the results of multi-paths in a single model.Therefore,we can improve the video classification performance without training multiple models.Finally,we designed multiple experiments to prove that the C-Bi-LSTM Attention network and the Multi-path Ensemble Network have achieved better performance in the data sets of video multi-label classification.
Keywords/Search Tags:Video Classification, Multi-label Classification, Deep Learning
PDF Full Text Request
Related items