Font Size: a A A

Research On Open Domain Question Generation Technology For Large-scale Knowledge Base

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2428330605961311Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Question generation(QG)over knowledge base(KB)focuses on generating simple questions that can be answered by a triple,which is the opposite of the task goal in the knowledge-based question answering(KBQA)system.The input of KBQA is a question in the form of natural language,and the answer to the question is inferred from a large number of triples in KBs.Answering the questions with KBs needs large amount of labeled question-answer pairs.However,it's very expensive to employ people to produce a large-scale and accurate standard dataset,and the labeled dataset will also be limited by factors such as the domain.To reduce the workload of labeling,QG has been proposed,and it has received increasing attention from both industry and academia.However,there are many challenges in QG over KBs.For example,the existence of a large number of low-frequency words in triples causes the OOV(Out of Vocabulary)problem;the input of the QG model is only a triple that lacks context information,which makes the generated questions lack diversity,etc.Therefore,this thesis explores an open-domain QG algorithm based on large-scale KBs to reversely generate appropriate and informative questions from triples containing answers.The main contributions of thesis are as follows:To solve the OOV problem caused by a large number of low-frequency words in the dataset,this thesis incorporates the Copy mechanism on the generation framework.Due to some defects in the internal calculation of the Copy mechanism,the prediction of common vocabulary is weakened.So this thesis improves the Copy mechanism and constructs a QG model(AC-KBQG)based on the Attention-Copy mechanism to strengthen overall vocabulary generation when solving OOV problem.In the results of many experiments,it was found that the generated questions had ambiguous intentions.To clarify the intention of the question,the question type is proposed to strengthen the feature representation,so that the generated question has more accurate interrogative words.The thesis conducted experiments on the English and Chinese datasets:SimpleQuestions and NLPCC-KBQG 2018.The experimental results based on the automatic evaluation and human evaluation show that AC-KBQG performs better than other baseline models.The AC-KBQG model enhances the quality of general vocabulary generation when solving OOV problem,and also clarifies the intentions of questions.However,the generated questions lack context information,resulting in most of them being concise and lacking diversity.To improve the diversity of generated questions,this thesis proposes a QG model based on Graph Transformer Network(GTN-KBQG).This model focuses on enhancing the representation of multi-granular semantic features of triples,using dual coding layers:a graph-level coding layer based on Graph Transformer and a word-level coding layer based on BERT enhancement.In this thesis,the entities and predicates in the KB are formed into a knowledge graph in advance to give the entities a globalized vector.Aiming at this QG task,the input node is refined by the parallelism of Transformer structure.To make full use of the semantic vectors of words,the word sequence of the triples first obtains the vector representation from the BERT pre-training model,and then uses the bidirectional gated recurrent unit network(GRU)to obtain the context vector.Finally,the two coding layers are combined to obtain a more complete representation of the triple feature,which is input into the decoding layer to predict questions.The experimental results on the English dataset SimpleQuestions show the effectiveness of GTN-KBQG.
Keywords/Search Tags:Question generation, Knowledge base, Question answering, Semantic representation
PDF Full Text Request
Related items