Font Size: a A A

Research Of Extractive Chinese Machine Reading Comprehension

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:J ZengFull Text:PDF
GTID:2428330605461386Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence theory and technology,machine reading comprehension has become one of the research hotspots in academia as well as industry.Machine Reading Comprehension(MRC)is a process in which computers automatically answer questions from given texts.It can not only improve the accuracy and richness of the question answering system,but also serve as one of the criteria to judge whether a machine can understand human language.Machines need to be able to read and understand in two ways:1)answering as many answerable questions as possible,and 2)identifying as many unanswerable questions as possible.Although MRC has made a breakthrough in recent years,there are still many problems.The current methods focus on improving the presentation ability of general pre-training language models,and do not optimize aimed at the characteristics of MRC,resulting in the lack of the ability to answer questions.The current methods assume that there must be an answer in the given text.So it cannot effectively identify the problem without answer.In view of the above two deficiencies,this article proposes a MRC model based on joint attention mechanism,and a MRC model based on inference and verification.The main work of this article is as follows:(1)This article proposes a MRC model based on Joint Attention(JointAtt-MRC),which adds an information interaction layer after the pre-training language model to enhance the model's representation of text.JointAtt-MRC not only alleviates the problem that the pretraining language model is not capable of capturing partial dependent by Bi-directional Long Short-Term Memory Neural Network,but also uses joint attention mechanism to enhance the weight representation of self-attention.To solve the problem of small size and inconsistent structure of common Chinese dataset,we also construct a new Chinese machine reading comprehension dataset,named Chinese-SQuAD,with a data volume of 110,000.It comes from the English version of SQuAD,which was translated into Chinese by machine translation.The format of Chinese-SQuAD is consistent with SQuAD 2.0.Experimental results show that JointAtt-MRC achieves significant improvement on the CJRC and Chinese-SQuAD,compared to the machine reading comprehension model based on standard pre-training language model.(2)This article proposes a MRC model based on inference and verification(InferVerif-MRC),which simulates human's habit of reading comprehension,and adds additional preposition inference module and postposition verifier module to improve the accuracy of identifying unanswerable questions.In the process of reading comprehension,human will firstly read the whole text roughly and judge whether the answer to the question can be found in the text.The second step is to read the passage detailedly to find the answers.The third step is to verify whether the answer is reasonable.The general end-to-end MRC model is similar to the second step.InferVerif-MRC simulates the first and third steps to improve the ability of model to recognize unanswerable questions through a preposition inference module(rough reading)and a postposition verifier module(validating rationality).Experimental results show that InferVerif-MRC is a good improvement over the CJRC and Chinese-SQuAD compared to the single MRC model.(3)The article constructs an open domain Chinese MRC system,which can find the answers of questions in any domain from large-scale unstructured text.This system combines Information Retrieval(IR)and MRC technology to search for answers from the database and the Internet.As is known to all,the question answering based on knowledge graph needs large-scale knowledge graph,based on IR needs a large number of query-answer pairs and based on text generation is not accurate and not rich.Whereas if based on MRC technology,it does not need large-scale structured text to get more accurate answers.
Keywords/Search Tags:Question Answering, Machine Reading Comprehension, Pre-training Language Model, Natural Language Inference, Deep Learning
PDF Full Text Request
Related items