Multimodal Visual Question Answering Methods Based On Action Semantic

Posted on:2020-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:J W Lian

Full Text:PDF

GTID:2428330590973934

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Human experience of the world is multimodal,such as images,sounds,smells,and so on.In order to obtain information more efficiently,it is expected that computers can understand and process multimodal data.Visual question-answering is a popular research direction of multimodal data,which combines visual technology and natural language processing technology.It gives a corresponding answer to the input image and question,and has a good application prospect in the fields of security and children education.The current multi-modal visual question-answering methods cannot further understand image content based on specific application scenarios,and the application scenarios are too extensive.Although these methods can better distinguish different types of questions in different scenes and give relevant answers,the accuracy is still not good for the related questions in the same scene.On the other hand,in feature extraction of multi-modal data,current methods do not fully consider the characteristics of visual question-answering task.They simply extract features on single-modal data,and have insufficient feature expression ability to learn deep semantic information.In order to solve the shortcoming of the current multi-modal visual question-answering methods,we propose a multi-modal visual questionanswering method based on action semantic.In real application scenarios,people's question of images is often about interaction information.We propose a multi-branch behavioral semantic information extraction network,ASI-Net,based on attention mechanism,for the problem of application scenario is too extensive.It helps the model focuses more on learning interaction information.Through the attention mechanism,the surrounding information of human and object instances is further extracted.And the spatial information of the human and object instances is integrated to detect the interaction in the image.So that the model achieves the extraction of action semantic information.We propose a feature extraction method of bidirectional attention mechanism,in order to solve the problem that the current visual question-answering methods have insufficient ability to express the features of multimodal data.First,the model automatically detects the object instance of the image and extracts the features in the corresponding position.Then,dynamically assigned different weights by the guidance of the problem for different target instance features.It improves the model's ability to express features of multi-modal data and can learn richer semantic information.Action semantic information extraction network and bidirectional attention mechanism feature extraction methods are all for improving the effect of visual question-answering model.In this paper,the action semantic information extraction network and the multi-modal data feature extraction network are fused to realize the multi-modal visual question-answering algorithm model,ASM-Net,based on action semantic.Experiments show that the accuracy of our multi-modal visual questionanswering method based on action semantic reaches 70.13% in open-end questions,which is higher than the mainstream visual question-answering methods.And its accuracy in interaction-related questions exceeds the current model by 2.18 percentage points.

Keywords/Search Tags:

attention mechanisms, action semantic understanding, multimodal visual question-answering methods

PDF Full Text Request

Related items

1	Research On Visual Question Answering Based On Text Semantic Understanding
2	Research On Key Techniques Of Question Understanding For Open-domain Question Answering System
3	Research On Visual Question Answering Based On Visual Attention
4	Research On Visual Question Answering Method With Visual Content Understanding And Text Information Analysis
5	Research On Visual Question Answering Based On Deep Neural Network
6	Research And Implementation Of Question Answering System Based On Semantic Understanding
7	The Research On Question Understanding Technology Of Chinese Question Answering System
8	Adversarial Multimodal Network For Video Question Answering
9	Deep Convolutional Network And Regional Attention Network For Visual Question Answering
10	Research On Visual Question Answer Algorithm Based On Attention Mechanism