Research For Chinese Reading Comprehension Based On Word Distributed Representation

Posted on:2015-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:S Zhang

Full Text:PDF

GTID:2308330461985028

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

Natural Language Processing(NLP) is a core area of artificial intelligence. In the NLP area, it facilitates people to obtain useful information exactly by developing technology of Reading Comprehension(RC). Reading comprehension task aims to obtain answer sentence automatically from a natural language article and a article-related question. In recent years, there are many studies on RC task at home and abroad. They mainly introduce some match scores between a sentence and a question in one article based on one-hot shot word representation. However, there are few studies to introduce word distributed representation into the reading comprehension task.In this thesis, we firstly attempt to train word embedding matrix through Neural Language Model(NLM). Then, we cast the reading comprehension task as a two-class classification task. We employ maximum entropy model and use Chinese Reading Comprehension Corpus(CRCC) as our dataset, which is developed by Shanxi University. By introducing the distributed word representation matrix, we construct several features. These features are listed as follows:1)MAXOUT feature, it describes the Euclidean distance for the maximum vectors of two distributed matrices of question sentence and answer sentence; 2) Algorithm mean feature, it is Euclidean distance for the mean-valued vectors of two distributed matrices of question sentence and answer sentence; 3) Average word pair similarity feature, it is a average value of a matrix, each element of which is a Euclidean distance of a vector of a word in question and another vector of a word in sentence.4) Angle cosine feature, it is calculated by adding angle cosine to the MAXOUT feature. These features are used to measure similarity between a question and a sentence in one article. Due to the CRCC is a small datase, we use held-out validation to conduct experiments. We segment the corpus into five training sets and test sets, and use HumSent to evaluate model performance.The initial results are obtained firstly by using unscaled word embedding matrix. Then, scale optimization is used to the word embedding matrix. The word embedding matrix and features are selected to promote the performance for the model in training and test sets. The result shows that, the HumSent accuracy rate of 63.37% is obtained by adding the word embedding matrices which have been optimized to the maximum entropy model, using 11 features to train in the model. Then, the HumSent accuracy rate of 63.81% can be obtained, based on a character embedding,2.07% is achieved. Above all, the word distributed representation embedding matrices can the promote the model performance of RC task in a certain extent.

Keywords/Search Tags:

Reading comprehension, Maximum entropy model, Neural language model, Distributed word representation

PDF Full Text Request

Related items

1	Reasearch On Machine Reading Comprehension Methods Based On Incorporating The History Of Conversation
2	Research On Machine Reading Comprehension Based On Enhanced BERT Representation And R-GCN Network
3	Research On Reasoning Machine Of Reading Comprehension Model With Multiple Choice
4	Research Of Machine Reading Comprehension Based On Mixed Attention Mechanism And Multi-Level Semantic Representation
5	Research On Reading Comprehension Of Pre-Training Language Model Fused With Knowledge
6	Research On Span Extractive Machine Reading Comprehension Model Based On QA-NET
7	Research Of Extractive Chinese Machine Reading Comprehension
8	Research And Implement Of Machine Reading Comprehension Based On Semantic Reasoning And Induction
9	Research On Reading Comprehension Model Based On Attention Mechanism
10	Financial Text Reading Comprehension Based On LDA And BERT