Font Size: a A A

Research On Knowledge-based Question Answering And Question Generation

Posted on:2020-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W BaoFull Text:PDF
GTID:1368330590972977Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A Question Answering(QA)system is defined as an agent which first tries to understand natural language questions and then provides answers based on existing resources.QA systems are widely used in practical scenarios,such as information retrieval,chitchat agent,personal speech assistant,and customer service agent.Traditional QA systems usually contain several problems: First,knowledge has been ignored in these methods for natural language question understanding.Second,for some specific domains,such as medical domain and commercial domain,it is hard to obtain large scale labeled data for QA system construction.To address the above problems,QA methods that fully utilize structured knowledge graphs and semi-structured tables are proposed.These QA systems can answer complex questions through deep question semantic understanding.Recently,question generation(QG)as the dual problem of QA receives much attention.In this paper,QG methods based on knowledge graphs,tables,and unstructured text are proposed.QG systems,on one hand,have the potential to provide necessary or additional data for QA systems,and,on the other hand,could be combined with QA systems and improve each other.First,research on question answering with knowledge graphs is introduced.A knowledge graph(KG),as a kind of high-precision structured knowledge,achieved based on carefully designed architecture and a lot of artificial efforts,is widely applied by QA systems as background knowledge.Since most existing datasets for QA with KGs(KGQA)mainly contain simple questions with single relations but not complex questions with multiple relations and special operations,most work pay less attention to complex questions that need natural language understanding and inference.In this paper,two KG-QA methods for complex questions are proposed.First,a translation-based KGQA approach which merges semantic parsing and answer obtaining into one unified translation-based framework is proposed.The proposed method,which is based on chartparsing algorithm,first answers sub-questions,and then recurrently answers higher-layer questions until the entire complex question in a bottom-up decoding process.This method is fit for multi-hop questions with chain-like LFs.However,it has limited ability to handle multi-constraint questions with star-like LFs.To Address this problem,the second KGQA method: constraint-based KG-QA approach,is proposed.We first systemically define a LF: multi-constraint query graph(MulCG),and then propose an approach to construct MulCGs for questions through constraint detection and banding.Experiments results on these two methods show that they improve the ability of KG-QA systems on answering complex questions.Second,research on question answering with tables is introduced.Since the construction of a KG costs a lot of artificial efforts,and the coverage of a KG on real-world knowledge is limited.Semi-structured tables,as a kind of widely existed knowledge on the web which are more lightweight than KGs,easier to be obtained,and covering more specific domain knowledge,are very important resource to QA research.Therefore,table-based QA(TB-QA)receives much more attention.Existing TB-QA research either leverage information retrieval(IR)-based method to solve simple questions or adopt semantic parsing(SP)-based method to solve complex questions.In this paper,using IR-based method to solve complex questions,which not only improve the ability of a QA system on solving complex questions,but also alleviate the problem of large search space to a SP-based method,is proposed.Specifically,four kinds of features to encode normal linguistic phenomenon information are designed.Each complex question and answer candidate pair is represented as a dense feature vector,based on which a ranker to rank these answer candidates is learned.Experiment results show the effectiveness of our IR-based TB-QA approach.Third,research on description and question generation with knowledge graphs and tables is introduced.Training a KG-QA and a TB-QA model usually needs large scale training data.However,obtaining large scale training data usually costs much human effort.As the dual problem of QA,question generation(QG)systems based on knowledge can help alleviate the data lacking problem.We propose a neural network model called table-to-sequence(Table2Seq)which generation text and questions based on KGs and tables.Since a KG triple can be transformed into a table with two columns and two rows,KG triples are referred to as tables in this context.The Table2 Seq model contains an encoder which fully considers the structure information of a single-row table or a multirow table,and a decoder to generate text or questions.To solve the rare word problem of a traditional decoder,Table2 Seq model leverage attention and copying mechanisms to make it possible to generation rare words.To make the model have the ability to generate text or questions with different patterns for different input tables,the global and local information of a table in the decoder is adopted to make it distinguish different tables and generate text or questions with different patterns.Experiment results on four different datasets show the effectiveness of our Table2 Seq model.Finally,research on question generation with text is introduced.Compared to structured KGs and semi-structured tables,unstructured text is more easier to be obtained,having larger scale,and covering more knowledge.Recently,to help improve question answering systems based on unstructured text,such as machine reading comprehension(MRC),question generation based on unstructured text receives much attention.To address the problem of lacking labeled data in specific domain,leveraging existing labeled data from source domains and unlabeled data from the target domain to learn a doubly-adversarial net(DoubAN)for question generation is proposed.DoubAN includes a question generator(QG),a domain classification discriminator(DC-Dis),and a question answering discriminator(QA-Dis).During the doubly-adversarial training between QG and DC-Dis,and between QG and QA-Dis,DoubAN fully utilizes model generated data and learns domain-general representation for input text to research text-to-question generation on the target domain.We conduct experiments with SQuAD and NewsQA datasets,and the results verify the ability that DoubAN can effectively generate questions on target domain without labeled data.
Keywords/Search Tags:question answering, question generation, structured knowledge graphs, semi-structured tables, unstructured text
PDF Full Text Request
Related items