Font Size: a A A

The Research Of Grammar Error Correction

Posted on:2017-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:X H WangFull Text:PDF
GTID:2348330518493517Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the world integration,English,as a worldwide language,has been given a high value by its learners.Among the four basic skills of English proficiency which include listening,speaking,reading and writing,writing is accepted as a skill that has been largely applied,has the widest and comprehensive knowledge,and also is the most difficult to master.Additionally,as an ESL(English as a second language)learner,due to the variation of culture and thinking ways,as well as the influence of the mother tongue,he or she is mostly likely to make grammar errors in writing which is one of the hardest problems to solve.The automatic checking and correcting of English grammar errors in writing is a process that mainly connects the techniques in the field of natural language processing with machine learning methods,in order to make computers to check automatically if there are grammar errors in English sentences and correct them.This study focuses on a method of automatic rule extraction based on corpus,with which the study presents an approach to automatically check and correct English grammar errors in English passages based on the limited back-off strategy of corpus.First,a large number of English texts are obtained through a web crawler.Second,a corpus which can be requested in real-time is established after the texts are combed,punctuated,POS(Part-of-Speech)tagging and other ways of pretreatment.Third,combining with training sets and through the methods mentioned above which extracts rules automatically,the false grammatical rules are obtained.Then,based on the limited back-off strategy,the grammatical errors will be corrected.This method has got 0.3196 overall on F1 in CoNLL evaluation data of grammar automatic examining and correcting in 2013 which has exceeded the first one that has got 0.3120.Moreover,on the aspect of correcting article errors,its F1 got 0.3345 which has surpassed the best scores of 2013:0.4435.The results showed that the method is effective in detection and correction of grammatical errors.The significances of this study are presented as as follows:1.It proposes a method of automatically extracting grammatical rules that applies training sets and corpus,and acquires 41,278 rules according to the articles written by ESL learners.Writing grammatical rules by hand is time-consuming and probably deficient.Besides,it cannot target the errors of ESL learners.However,the method demonstrated in this study solves this problem efficiently.2.A hybrid search of words and part of speech is produced and a corpus providing real-time query,which contains 16,618,045 sentences from New York Times,students' compositions from the Pigai system as well as CoNLL2013 training sets,is constructed.The corpus is able to provide the search of words,phrases,part of speech,and the hybrid search of words and part of speech,which ensures the extraction of false grammatical rules based on the corpus and the search of subsequent automatic grammar checking and correcting.3.It proposes a method that filtering texts by using the knowledge base,in order to lower the misjudgment ratio of fixed collocation through grammatical errors checking.Additionally,a list of fixed collocation is established to serve grammatical errors checking.In the process of automatic grammatical errors checking,some phrases which accord with linguistic culture but may not follow the grammar,which will reduce the detection accuracy of the system.As a result,this study adopts a list of fixed collocation to lower the misjudgement ratio.4.It demonstrates an algorithm to check and correct grammatical errors with the limited fallback strategy based on the corpus.The algorithm connects the process of fallback to the size of the window,which contributes to controlling the process more accurately and improving the property of the system.
Keywords/Search Tags:grammatical error correction, corpus, automatic rule extraction, limited back-off
PDF Full Text Request
Related items