Font Size: a A A

Research Of Visual Question Answering Technique Based On Deep Learning

Posted on:2021-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:M Y YiFull Text:PDF
GTID:2518306503980359Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Visual question answering is a task that belongs to the intersection of computer vision and natural language processing area.It requires the machine to have not only an understanding of images and questions,but also an ability of reasoning.The task is of great significance for exploring the realization of machine intelligence and building a consistent representation of cross-modality data.This work improves the effectiveness of visual question answering model by cross-modal retrieval methods.Existing methods normally neglect the deviation of image features and text features,which are extracted by different networks and have independent distribution.Proposed model matches image features and text features,which is unifying them into a common representation space based on their high-level semantics.Then the model uses attention mechanism to fuse the processed features and classifier to give the final answer.The cross-modal retrieval method reduces the model's complexity in finding connections from two different vector spaces,allows the model to focus on generating answers through joint features.Proposed model requires the dataset to provide additional annotations for image captioning task.It has great performance on relevant dataset.
Keywords/Search Tags:deep learning, visual question answering, cross modality, attention mechanism
PDF Full Text Request
Related items