Research On Semantic Guiding Video Captioning Methods With Attention Mechanism And Memory Network

Posted on:2020-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Yuan

Full Text:PDF

GTID:2428330602950201

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

The task of describing video with natural language is called video captioning.It combines key technologies of natural language processing and computer vision.The research results promote the development of cross-modal analysis technology.In recent years,more and more researchers have been engaged in the research of video captioning.Generating video sentences is a complex task,which not only identifies different objects in a video and the interactions between them,but also describes the video content with natural language.Currently,most methods of video captioning are based on sequence learning approach,which first uses convolutional neural networks to extract the features of a video,and then uses recurrent neural networks to generate sentence descriptions from the visual features.In this paper,our approach is based on sequence learning method,our main contributions are summarized as follows:(1)We propose a video captioning method based on deep visual features and semantic attributes.Most existing video captioning methods only use the visual information of a video,but ignore the semantic information which is very important for the video description.Therefore,this method not only utilizes the visual information of videos,but also exploits the semantic information as the guiding information,when performing the video description.Firstly,the method uses two kinds of convolutional networks to extract features of single frame and successive frames of the video,respectively,and then averages those features to obtain visual object features and motion features of the video.Then,three types of semantic attributes are obtained from the sentence description of the training set,and each separate semantic attribute predictor is trained for each type of semantic attribute.Finally,we propose a semantic guiding long short-term memory networks,which uses semantic attributes to guide video description generation.This paper conducts experiments on the MSVD dataset,and the results are improved on many indicators compared with the state-of-the-art methods.(2)We propose a video captioning method that combines attention mechanisms and memory networks.In order to fully capture the object and motion information in the video,this method combines attention mechanisms and memory networks into semantic guiding long short-term memory networks.First,this method uses attention mechanism to selectivelyfocus on the most significant visual content,so that,the model will focus on the most significant objects and actions in the current time video.Then,this method increase the memory capacity of the memory cells in the long short-term memory networks by adding external memory networks,and the memory networks interact with the internal state of the long short-term memory networks through reading and writing operations.Finally,the output features of the attention mechanisms and the information read from the memory networks are input to semantic guiding long short-term memory networks for generating a video description.Extensive experiments are conducted on MSVD dataset,and the results show that our method is superior to the state-of-the-art methods.

Keywords/Search Tags:

Video captioning, multi-feature representation, semantic attributes, attention mechanism, memory network

PDF Full Text Request

Related items

1	Research On Image Captioning Algorithm Based On Deep Learning
2	Video Captioning Algorithms Based On Multi-head Attention Mechanism
3	Research On Intelligent Semantics Generation For Visual Data
4	Spatio-temporal Attention Model For Video Captioning
5	Researches On Short Video Captioning Based On Deep Learning
6	Research On Video Captioning Algorithm Based On Attention Mechanism
7	Research And Application Of Video Captioning Technology Based On Deep Learning
8	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
9	Image Captioning Based On Adaptive Visual Attention Mechanism
10	Research On Video Captioning Based On Deep Learning