Font Size: a A A

Research And Implementation Of Grammatical Error Correction Based On Recurrent Neural Network

Posted on:2019-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2348330545961549Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
English is the most widely used international language in the world.And it is,as a second language,valued by more and more learners(English as Second Language,ESL).However,ESL learners face a variety of challenges such as listening,speaking,reading and writing because of differences in culture,geography and living habits.Among which writing is the most important and most difficult,because there may be a lot of grammatical errors.English Grammatical Error Correction(GEC)is extremely important for both English learners and English teachers.In this thesis,we propose a sequence labeling model based on recurrent neural network to solve the sequence labeling problem for ESL corpus which is full of grammatical errors.Then we propose a grammatical error correction method based on sequence labeling model,and a grammatical error correction method based on sequence-to-sequence model.Firstly,our sequence labeling model based on recurrent neural network improves the accuracy of part-of-speech tagging in ESL corpus to 96.73%;and the accuracy of the part-of-speech tagging for WSJ(Wall Street Journal)corpus reaches 97.60%;in the CoNLL2003 named entity corpus,F1 value reaches 91.38%.Then,we apply our sequence labeling model to the GEC task,the F1 value reaches 38%in the determiner error correction,better than UIUC which is the most excellent result(33.40%)in CoNLL2013 GEC task;and 28.89%F1 in the prepositional errors,better than UIUC which is 7.22%.And at last,we build a sequence-to-sequence model for GEC task using our sequence labeling model,and our model reaches the F0.5 value of 31.77%in the latest CoNLL2014 GEC data and the recall value of 38.92%,better than CAMB(30.10%)which is the best result in CoNLL2014 GEC task.The contributions of this thesis can be summarized as follows:1.Propose a neural network model to effectively solve the sequence labeling.Our network maintains high labeling accuracy in standard corpora like news and non-standard corpora like English essay written by ESL learner.Different from the previous labeling model,our model uses character-level,word-level and sequence-level information,and introduces the rough supervision and divides the labeling process into two stages,which makes the labeling process more robust.2.Propose a method of detecting and correcting English grammar errors by using our sequence labeling model.This method surpasses the best result in the CONLL2013 GEC review.3.Propose a method of detecting and correcting English grammar errors by using our sequence-to-sequence neural network.The encoding part of our model comes from our sequence labeling model,and the Attention mechanism is introduced into the decoding part.4.Design and implement an English grammatical error detection and correction system by using both our sequence labeling model and sequence to sequence model.
Keywords/Search Tags:grammatical error detection and correction, recurrent neural network, sequence labeling model, seq2seq model, esl corpus
PDF Full Text Request
Related items