Font Size: a A A

Dynamic Capsule Attention For Visual Question Answering

Posted on:2020-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ChenFull Text:PDF
GTID:2428330575464614Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the current study,highly discriminative visual features are usually obtained by convolutional neural networks in computer vision tasks,and Natural language processing tasks effectively model sequential text data by means of recurrent neural networks.Both areas have achieved significant improvements in performance,but Visual Question Answering which is a cross-disciplinary task still faces enormous challenges.The task scenario of VQA is to have the model answer any questions based on a given picture.VQA task requires not only a full understanding of visual and textual information,but also the ability to find specific visual information based on task status.The principle of the visual attention mechanism is to let the model evaluate the importance of all given features based on given reference information,and then obtain compact features that are highly correlated with the reference information.In the VQA task,the attention mechanism helps the model to assess the importance of each image area based on the problem information and obtain compact visual features.However,when the difficulty of the task increases,it has been harder for the traditional single-layer attention model effectively assess the importance..Therefore,part of the recent research works has used a multi-layered attention mechanism to perform multi-step model reasoning.Although the multi-layered attention network can improve performance,the amount of model parameters increases dramatically.Inspired by the recently proposed Capsule Network,this thesis proposes a dynamic capsule attention mechanism.The traditional multi-layered attention mechanism performs attention operation at different attention levels and uses BP algorithm to update its weight value.Unlike the traditional multi-layered attention mechanism,the dynamic capsule attention mechanism implements the multi-step attention operation of the model in an iterative manner,using only one layer of attention,and updates the weight value by means of routing update.The algorithm obtains the joint representation of visual features and problem features by multiple accumulations,and uses it as the output of the algorithm.Therefore,the dynamic capsule attention mechanism parameter quantity is less than the traditional multi-layer attention algorithm.For the VQA task,this thesis compares the dynamic capsule attention mechanism with the traditional multi-layer attention mechanism.Experiments show that the dynamic capsule attention algorithm proposed in this thesis can help the model to reduce the parameter quantity of the model and improve the compactness and robustness of the model while maintaining high reasoning ability.In order to verify the algorithm portability,this thesis applies the dynamic capsule attention mechanism to Image Caption tasks.The experimental results show that the dynamic capsule attention mechanism achieves the same or better effect in both quantitative and visual perspectives.
Keywords/Search Tags:Attention, Visual Question Answering, Image Caption
PDF Full Text Request
Related items