Research On Multi-mode Question Answering Method

Posted on:2023-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Bai

Full Text:PDF

GTID:2558307163489684

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Traditional question answering algorithms are mainly oriented to single mode scenarios,such as Xiaoai,Siri,and customer service question answering robot.They only contain text or audio mode information in questions and answers.The traditional question answering algorithm mainly has the following problems: 1)it cannot answer the multimodal questions raised by users,such as asking questions for a certain table or asking questions for a certain picture;2)For the questions raised by users,multi-modal answers cannot be displayed,that is,the answers without words and pictures.In view of the above two problems,this paper designed a visual question answering model based on neural cellular automata and a multi-modal question answering model based on reading comprehension.The specific research contents are as follows.(1)For the scene where users ask questions to images,this thesis proposes a visual question and answer model based on neural cellular automata,and gives users accurate answers based on the understanding of visual and text modal information.In this model,a question coding module is constructed to convert question statements into text semantic vector.According to the user input image,an image coding module is constructed to extract the visual object in the image and convert it into visual semantic vector.To solve the problem of cross-modal alignment between visual and text,the text semantic vector and visual semantic vector were constructed into multi-modal cell graph,and the multimodal fusion vector was generated by using the self-defined cell "birth and death rule".The final answer prediction uses the classification layer as a decoder to select the best match from the candidate answers.Experiments on the visual question answering dataset show that the visual question answering method based on neural cellular automata has a certain improvement in accuracy and interpretability compared with the existing traditional neural network methods based on CNN or RNN.(2)For the answers with words and pictures,this thesis proposes a multi-modal question and answer model based on reading comprehension,which provides users with a answer with words and pictures on the basis of understanding the semantics of the question.In this model,two text answer generation methods are designed for users’ questions.The first method is "selective text answers",which converts questions and paragraphs into joint embedding vectors and predicts text answers by using sequence annotation model.The second method is "generative text answer".This method uses two stages of training,unsupervised pre-training of paragraph corpus and fine-tuning training of question and answer data respectively,and uses generative model to predict text answer.For the generation of image answers,the question and text answers are converted into joint embedding vector,and the appropriate image answers are matched in candidate images by contrast learning.Experiments on multi-modal reading comprehension data sets show that the multi-modal question answering model based on reading comprehension can generate answers with words and pictures for users,and has a certain improvement in accuracy and answer richness compared with existing methods.

Keywords/Search Tags:

Multimodal, Image Text Matching, Cellular Automata, Graph Neural Networks

PDF Full Text Request

Related items

1	Research On Text-Image Summarization Method Based On Heterogeneous Graph Neural Networks
2	Research On OCR Text Sorting Based On Multimodal And Graph Neural Network For Rich Text Image
3	Cellular Automata And Its Application In Image Processing
4	Integration of cellular automata computation model into bit-stream and Feistel networks cryptographic protocols (Spanish text)
5	The Application Of Cellular Automata And Fuzzy Cellular Automata
6	Encryption Algorithm Based On Cellular Automata Research
7	Research And Application Of Cellular Automata On Parallel Image Encryption
8	A survey of the use of cellular automata and cellular automata-like models for simulating a population of biological cells
9	Research On The Method Of The Fabric Image Mosaic By The Cellular Automata
10	Image Processing Based On The Cellular Automata