Font Size: a A A

Research On Visual Question Generation Algorithm For Specific Scene

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:L J SunFull Text:PDF
GTID:2428330611999435Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of deep learning,visual question generation(VQG)has gradually become an important research cotent.Its task form is to generate corresponding question with the given image.VQG has very important research significance in the field of multimodal dialogue,childhood education and medical consultation.Through the investigation of the research status of VQG,this paper finds that most of the current research on VQG is far from the actual demand of certain specific scene(such as early childhood education,medical consultation,etc.),mianly two shortcomings.One is that most VQG models do not explicitly “point out” the corresponding regions in the image for the generated question;the second is that most VQG models can not only generate generality question,that is,the type of question(such as color,shape,etc.)is random.This generality question usually does not make sense in some specific scene,so how to make the model generate a question with specific category is an important research directions.In view of the above first shortcoming,this paper firstly uses the FCLN model as an auxiliary model to preprocess the input image.For each image,different target detection regions and content description statements corresponding to each region are obtained,and then the extraction-generation-reinforcement learning(EGR)model is proposed.The EGR model includes three sub-modules,Extractor,Generator and Joint reinforcement learning.The Extractor module first extracts all the content description statements of each image by using the attention mechanism,and then the Generator module converts each content description statement into a corresponding question statement.Finally,the experimental results show that the EGR model can not only “specify” the image sub-region corresponding to each question,but also can basically be equal to the mainstream model in terms of ROUGE and other indicators.For the second shortcoming mentioned above,this paper proposes a specific category question generation model based on variational auto-encoder.The model contains encoder network and decoder network.The input of the encoder is the image,the question category and the real question statement.Then the attention mechanism is used to encoder the input,and finally the input is mapped to a latent space.The main functio of decoder is to sample the vector from the latent space and reconstruct the input question statement.The results on the VQA dataset show that compared with the current mainstream model,the model proposed in this paper has bacically improved in various indicators,such as Bleu-4 increased by 1.61%,METEOR increased by 0.79%,which proves that the model proposed in this paper can generate question statements that are closed to humans.In addition,the Strength and Inventive indicators are increased by 5.04% and 9.64%,respectively,which proves that the proposed model can generate more diverse questions.
Keywords/Search Tags:Visaul Question Generation, Attention Mechanisms, Specific Category Question Generation
PDF Full Text Request
Related items