Video Question Answering Based On A Forget Memory Network

Posted on:2019-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Ge

Full Text:PDF

GTID:2428330593951035

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

In recent years,with the in-depth study of deep learning techniques,computer vision and natural language processing areas have made great progress.In computer vision area,image classification,object detection,action recognition and video classification have got good performance in some open relevant datasets.In natural language processing area,text classification,language model,speech recognition and machine translation also have achieved good performance.And they have been put into use.The advent of the age of artificial intelligence needs the machine have the cognitive ability.The machine which has the cognitive ability not only has to have good visual recognition ability,but also they have to recognize some natural language just like human.Video question answering task has received much attention in recent years.It combines the fields of computer vision and natural language processing and can answer some relevant questions according to the visual content of the video clips.Due to the complexities of the video data,only a few methods have been proposed for video question answering.Compared to image question answering task,video question answering task faces more challenges.It needs to exploit the temporal information of the video clips.What's more,most regions of the video frame are not relevant to the question.These irrelevant regions can be considered as noise.How to find the useful region features for the question,forget the irrelevant region features,fuse the video and text features,and exploit the temporal information of the video are problems needed to be solved.To solve these problems,we propose a forget memory network to solve video question answering task.For a sequence of video frames extracted from the video clip,we use convolutional neural networks to extract the video frame features.Then according to the relevant question,we use the forget memory network to select the useful region features for the question,and forget the irrelevant region features.If the video clips have some text descriptions,we can use the forget memory to process text information.Then the video features and text features are fused together.We use the fused video and text features to solve the video question answering.We also extend the forget memory network.When we use the forget memory network get video frame features,we input a sequence of frame features into a gated recurrent unit model.And we use the output features of the last timestep to represent the video features.It can get more spatial information of the video.Our proposed approaches achieve good performance on the MovieQA^[1]and TACoS^[2]datasets.

Keywords/Search Tags:

Image Question Answering, Video Question Answering, Convolutional Neural Network, Forget Memory Network

PDF Full Text Request

Related items

1	Research On Affective Visual Question Answering
2	Research And Application Of Key Technologies Of Community Question Answering
3	Research On Visual Question Answering Based On Multi-Channel CNN-LSTM
4	Research On Deep Learning Algorithm For Automatic Question Answering
5	Research On Key Techniques Of Knowledge Base Question Answering Based On Multi-granularity Semantic Matching
6	Study Of Question Recommendation In Community Question Answering
7	Video Question Answering Based On Deep Memory Fusion Method
8	Research On Robot Question Answering System For Freshmen Register In Colleges And Universities
9	The Study On Routing Questions In Community Question Answering
10	Multi-Grained Hierarchical Attentional Recurrent Network For Video Question Answering