Attention & Meta-learning Based Visual Question Answering

Posted on:2022-08-18

Degree:Master

Type:Thesis

Country:China

Candidate:G Y Li

Full Text:PDF

GTID:2518306524490254

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Visual question answering(VQA)has been a hot research area in deep learning,This task is defined as follows: A VQA system involves visual and textual processing.Using natural images and free-form natural language questions as input,To generate a natural language answer as output.Current VQA methods are usually based on object detection model,which is slow in calculation and lack of interpretation.And training relies on large set of samples and lacks the ability to learn from small set of samples.In this thesis,In order to solve the calculation performance consumption problem,image features are extracted by using pure Transformer structure or combining convolution with Transformer.And the key information of feature is extracted by attention method.At the same time,the meta-learning method further improves the few-shot learning ability.The main research contents of this thesis are as follows:Firstly,In this thesis,the influence of different visual feature extraction methods are re-examined,it turns out that convolution and Transformer can be used to replace the region selection and region feature calculation module,which greatly improves the computational efficiency.Compared with the traditional VQA method,this method has higher explanability.By visualizing the attention information in the model,we can clearly see the important areas in the image and the important words in the question during the process.Secondly,Traditional VQA methods rely on large training set,while the types and forms of questions involved in the VQA task are unpredictable.Traditional methods lack the ability to deal with unfamiliar problems,In this thesis,in order to enhance the fewshot learning ability,the questions are grouped according to the similarity.and a group of similar questions are compared through the meta-learning method,so as to infer the possibility of the same answers among these questions.In general,this thesis mainly uses the attention based method to extract textual and visual information,realizes the collaborative attention mechanism in multi-mode,and enhances the accuracy in the case of few-shot learning by using the meta-learning method.Finally,through the experiments,it is proved that the proposed model is superior to the traditional VQA methods in both accuracy and computational efficiency.

Keywords/Search Tags:

vqa, vision transformer attention, meta-learning, few-shot learning

PDF Full Text Request

Related items

1	Label Semantics And Transformer For Meta Learning Few-shot Object Detection
2	A Research On Few-shot Learning Based On Feature Enhanced MetaOptNet
3	Study On Few-shot Classification Method Based On Meta-learning And Graph Neural Network
4	Research On Few-shot Image Classification Algorithm Based On Deep Discriminative Feature Learning
5	Research On Few-Shot Text Classification Based On Meta-Learning
6	Study On Few-shot Learning Based On Deep Learning For Image Classification
7	Research On Few-shot Learning Based On Matching Networks
8	The Research Of Visual Recognition Based On Few-shot And Zero-shot Learning
9	Research On The Metric Few Shot Classification Based On Meta Learning Optimization
10	Research On Few Shot Image Classification Based On Meta-learning