Font Size: a A A

Short Video Classification Based On Modal Fusion

Posted on:2021-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2428330614963734Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of multimedia technology and network technology,a large amount of streaming media data is generated on the Internet and mobile platforms,video in streaming media data accounts for a large proportion.Video classification is a basic research direction in computer vision.It is an important intermediate step in solving video annotation and video retrieval,and an important method for managing huge multimedia data.In recent years,deep learning has been continuously developed and achieved very good results.More and more researchers have applied deep learning to video-related technologies.The video classification algorithm based on deep learning not only improves the accuracy of video classification,but also expands the number of categories that can be classified,and has been gradually used in actual video classification tasks.Therefore,video classification based on deep learning has high research and commercial value.This paper is mainly based on deep learning related technologies to study short video classification tasks.First,the common types of deep learning networks are introduced,including convolutional neural networks,recurrent neural networks,and generative adversarial neural networks.It also introduces commonly used programming frameworks and classification models for deep learning.Then,since the main research object of this thesis is video data,we focus on the feature extraction methods of each mode.Before that,the classic algorithms in traditional image feature extraction and video feature extraction are introduced,and then based on deep learning related models,the research focuses on video,audio and text feature extraction algorithms.After that,the overall framework of the algorithm in this paper is designed.Mainly study the principles of clustering network and attention mechanism network,and then study the multi-modal feature fusion method to build an overall network framework.Finally,based on the algorithm proposed in this paper,a large number of comparative experiments were done on the relevant data sets to analyze the experimental data in detail.At the same time,according to the experimental results,various improved methods are proposed,and the experiments are conducted in the public data set to compare with other models.
Keywords/Search Tags:video classification, deep learning, clustering network, attention mechanism
PDF Full Text Request
Related items