Font Size: a A A

Visual Question Answering Model Based On Answer Type Prediction

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y P JiaFull Text:PDF
GTID:2428330611998202Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,as one of the important signs of artificial intelligence technology,question answering system has been widely concerned by the industry and academia.For example,personal assistant,intelligent customer service and other application examples not only improve the user stickiness,but also help enterprises reduce the labor cost,laying a solid foundation for the research of question answering system.With the continuous development of computer technology,people are no longer satisfied with the voice,text as the carrier of communication,multimodal question answering system has become a new research hotspot.As a typical multimodal question answering system,visual question answering task has been widely concerned by researchers at home and abroad.The main goal of this task is to correctly answer the questions about the relevant pictures.Although the current visual question answering model has achieved good performance,there are still some questions that are given an irrelevant answer,such as the question 's type is color,but the answer's type is number.The occurrence of the situation seriously reduces the reliability of the visual question answering model.In this paper,the prediction of answer type is taken as the starting point.First,the prediction of answer type is carried out according to the proposed questions.After obtaining the corresponding category information,it is integrated into the visual question answering model,so as to reduce the occurrence of mismatch between question and answer type,and improve the reliability and accuracy of the model.The main research work of this paper is as follows.(1)The construction of answer type prediction model.Since the types of question and answer pairs in the visual question and answer data set are obviously different,and no corresponding label is given,so the data set will be labeled first.Using the long short-term memory network and other deep learning technologies to build the answer type prediction model,extract the text feature information of the question,and classify it to obtain the final classification information.(2)Visual question answering system based on deep learning technology.Using convolutional Neural Networks,Recurrent Neural Network,attention mechanism and other deep learning technology to build a visual question answering model.As a whole,seq2 seq architecture is adopted,computer vision technologies such as resnet and object detection are used to mine image information,LSTM and other networks are used to mine question text features,and multiple multimodal fusion technologies are used to fuse image a nd question,and finally the answer to the question is obtained.The model has achieved good results in the final comparative experiment.(3)Visual question answering model based on answer type prediction.Based on the above two parts,integrate the answe r type prediction model with visual question answering model,and integrate the answer type information into the answer generation process by modifying the attention mechanism and the third modal fusion,so as to guide the generation of the overall model a nswer and reduce the occurrence of mismatch between answer and question type.The accuracy of the final model and the overall performance are improved,which is in line with the expected results of this study.
Keywords/Search Tags:Visual question answering, Multimodal, Answer type, Object detection, Modal fusion
PDF Full Text Request
Related items