Research On Video Annotation Technology Based On Multimodality

Posted on:2021-01-28

Degree:Master

Type:Thesis

Country:China

Candidate:W Liu

Full Text:PDF

GTID:2518306107953159

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Video annotation technology can analyze the video information,understand the video content,annotate the video to achieve accuracy comparable to humans.With the increasing scale of video on the Internet,it is urgent to research related algorithms to find videos that users are interested in,the research of these algorithms is inseparable from video annotation.Therefore,video annotation technology is of great significance.Video annotation algorithm based on visual feature extracts the features of video frames through a convolutional neural network,aggregates the features over time,finally annotate video.The absence of audio features and importance of frames makes the method not accurate.In order to make video annotation more accurate,in view of the lack of existing video annotation models and algorithms,by combining visual features and audio features,a multi-modal video annotation technology is proposed.First of all,in order to better extract the visual features,first extract the key frames of the video before extracting the frame features,remove the redundant frames of the video,then use the deep convolution neural network to extract the visual information of each frame,and attention mechanism is added during aggregation,considering the importance of each frame to the video,NetAC pooling model based on attention mechanism is proposed.When processing the audio information of the video,the log Mel spectrum of the audio is first extracted,then the continuous audio manual features are processed using a deep convolutional neural network,the processed multi-segment audio frame features are input to the learning pool for aggregation.The obtained visual features and audio features are fused,and the dependencies between features are captured through the gate mechanism to obtain the final video features,and then the video features are input into the decoder to decode and obtain the final video annotation results.Using the NetAC pooling model and multiple pooling models to conduct video annotation experiments in audio modal,visual modal and multi-modality,respectively,the effectiveness of the NetAC pooling model is verified,and audio is an important feature of video can effectively improve the accuracy of video annotation.

Keywords/Search Tags:

video annotation, multi-modality, key frame extraction, convolutional neural network, learning pool

PDF Full Text Request

Related items

1	Research On Video Annotation With Machine Learning Techniques
2	Research On Automatic Image Annotation Based On Transfer Learning And Convolutional Neural Network
3	Research And Implementation On Context-based Video Multi-Semantic Annotation
4	Research On Multi-label Image Semantic Annotation Method Based On Deep Learning
5	The Study Of Motion Video Annotation Algorithm
6	Research Of Video Semantic Information Extraction
7	Research On Image Annotation Algonthm Based On Convolutional Neural Network
8	Research On Context-Based Audio And Video Annotation
9	Image Annotation Method Based On Transfer Learning And Deep Convolutional Feature
10	Research On Image Annotation Algorithm Based On Convolutional Neural Network