Font Size: a A A

Research On Decomposition Technologies Of Complex Questions In Question Answering System

Posted on:2020-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:W Y LiFull Text:PDF
GTID:2428330590973230Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Question answering is an important research direction in the field of natural language processing.It aims to build a system that can automatically answer questions raised by human beings in natural language.In the research,we find that many question answering systems which can answer simple questions(e.g."When is China's National Day?" have difficult to deal with more complex questions(e.g."What is Yao Ming's height and weight?","What's the height of Yao Ming's daughter?").We believe that many complex questions can be decomposed into several simple questions which can be answered by the question answering system.Therefore,the task of complex questions decomposition is proposed.This task is divided into two subtasks: complex questions classification and sub-questions generation of complex questions.The corresponding data sets are constructed for the two tasks respectively.In view of the lack of available Chinese corpus of complex questions,we collect corpus from Baidu Zhidao,One Stop to the End and HotpotQA.We explain the annotation rules in detail,and constructs the data sets for complex questions classification and sub-questions generation.Four types of complex questions are defined according to the syntactic structure of the questions and the way to answer them.Then we define the annotation rules in detail,and construct the data sets for complex questions classification and sub-questions generation.Data sets containing more than 5100 compound questions.Complex questions classification task aims to identify different types of complex questions and simple questions.We believe that complex questions have remarkable syntactic and semantic features.We compare three machine learning and deep learning methods including bi-directional GRU classifier,Tree-Kernel-SVM classifier and fine-tuning pre-trained BERT model classifier.As a result,we achieve a highest accuracy of 0.9240 on this task.Sub-questions generation task aims to generate multiple simple questions that can be used to answer the original complex questions.Unlike the rule-based or sequence-based annotation methods used in previous studies,we adopt sequenceto-sequence neural network model to generate sub-questions of complex questions.In our research,the Pointer-Generator-Net model is modified to improve the performance of sub-question generation.Considering that our data set is relatively small for a task about text generation with neural network model,we explore the methods of data augmentation of training data on the task of sub-questions generation,including replacing words using paraphrase table and filling templates,and construct the augmented training data containing tens of thousands of data.In addition,we also propose some other techniques to improve the performance of subquestions generation.Finally,in this task we achieve a ROUGE-L F score of 0.9376 and an accuracy of 0.57 by the artificial evaluation.
Keywords/Search Tags:Natural Language Processing, Question Answering System, Question Comprehension, Complex Question Decomposition
PDF Full Text Request
Related items