Font Size: a A A

Research On Tibetan Extractive Machine Reading Comprehension Based On Deep Learning

Posted on:2022-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:C F ChenFull Text:PDF
GTID:2518306332477514Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of technology,the research of natural language processing has gradually turned to the natural language understanding.Therefore,the machine reading comprehension emerges as the times require.It is not only percepts text,but also understanding the text.Machine reading comprehension requires the machine to mark the words in the text by part-of-speech tagging,to find out the entity information through named entity recognition,to understand the grammatical structure by performing syntactic analysis,and to learn co-reference resolution,etc.Therefore,it is the most suitable task to evaluate the comprehension ability of machine.Firstly,this paper constructs a high-quality Tibetan Question Answering dataset for machine reading comprehension(TibetanQA),which covers factual knowledge in 12 fields.Aiming at the constructed high-quality dataset,this paper conducts a multi-dimensional analysis in terms of question type,logical reasoning,answer type,paragraph length,etc.,and uses a method based on language feature ablation to evaluate the dataset.The experimental results show that the TibetanQA dataset has certain investigative capabilities in terms of word comprehension,semantic composition,internal context comprehension,and contextual relevance,and can be used to evaluate the model's ability to comprehend Tibetan texts.Secondly,this thesis proposes a sequence-to-sequence Tibetan question generation model TQGR(Tibetan Question Generation based on Rewards)based on a reward mechanism.To solve the problem of unregistered words and low-frequency words in question generation,the model uses "generation-copy"mechanism and attention mechanism to generate questions,and at the same time,this thesis introduces a "reward mechanism" to generate questions.Sentences are scored and fed back to the generator to optimize the fluency and contextual relevance of question sentences.The experimental results show that the TQGR model reaches 38.54%on ROUGE-L,which is 11.41%higher than the traditional sequence-to-sequence model.Thirdly,this thesis proposes a Tibetan machine reading comprehension model Ti-Reader(Tibetan Reader)based on a multi-level attention mechanism.First,this thesis proposes a word vector representation method incorporating Tibetan syllable information.Second,aiming at the problem of insufficient model ability,this thesis adopts a multi-level attention structure.At the word level,the attention mechanism is introduced to extract keywords in the context,and the sentence level is used to re-read the mechanism to obtain key sentence information in the article.At the same time,a self-matching mechanism is introduced to "read" the information of the article again,so as to avoid the missing of some key information in the model due to problems and differences in the form of the article.Experiments show that the F1 value of the Ti-Reader model on the Tibetan data set achieves an accuracy of 77.4%,which is an increase of 14.0%compared with the baseline system.
Keywords/Search Tags:Tibetan machine reading comprehension, corpus construction, question generation, attention mechanism, deep learning
PDF Full Text Request
Related items