Font Size: a A A

Research On Visual Question Answering Method Based On Answer Mask

Posted on:2023-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:F Y ShiFull Text:PDF
GTID:2568306794452424Subject:Electronic Information (Computer Technology) (Professional Degree)
Abstract/Summary:PDF Full Text Request
The emergence of deep neural networks has injected new vitality into the research of artificial intelligence technology,and has been relatively maturely applied in image recognition,target detection,speech recognition and other fields.As a new and popular research direction of artificial intelligence,visual question answering(VQA)task also benefits from this,and has been many research results in recent years.Due to the influence of language a priori,the prediction accuracy of the existing visual question answering models is not high.Models can learn the surface relationships between questions and answers,but not deeper relationships between them.,which is prone to the phenomenon of irrelevant answer.Therefore,a method using answer mask to cover the irrelevant answers in the prediction results is proposed,which forces the model to pay attention to the corresponding relationship between the problem and the answer type,and improves the prediction accuracy of the model.The main work of this paper is as follows:⑴ Generate answer mask.In order to cover as many irrelevant answers as possible,the answer types in the visual question and answer data set need to be re divided.The answer features are extracted and clustered to generate an answer mask composed of 0 and 1 for each type of answer.⑵ Answer type recognition model.A multi classification model is established by convolutional neural network,the text features are extracted by glove model,and the text is classified by classifier.The model is pre-trained so that the model can accurately identify the answer type corresponding to the question.⑶ Model fusion.The prediction results of the basic visual question answering model are fused with the answer mask selected according to the answer type recognition model to obtain the final predicted answer.Through the answer mask,the influence of irrelevant answers on the results is reduced,and the phenomenon of irrelevant answer in the visual question answering model is improved.Experiments are carried out on the method proposed in this paper,and the results show that the prediction accuracy of the CSS model incorporating the method in this paper on the VQA-CP v2.0 dataset has increased by 2.02%,reaching 60.14%,which is one of the best methods at present.
Keywords/Search Tags:Deep learning, Visual question answering, Language priori, Answer clustering, Answer mask, Answer type identification
PDF Full Text Request
Related items