Font Size: a A A

Grammatical Error Correction Based On N-Gram Model And Parsing

Posted on:2018-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:T ShenFull Text:PDF
GTID:2348330542952872Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There are various errors in electronic text,which has been a severe problem for researches.Manual correction is unable to adapt to the rapid growth of the number of electronic text.The use of automatic machines for text error checking and correction in English text become increasingly important.The current Grammatical Error Correction(GEC)methods including the method based on rules,the method based on N-gram model and the method based on parsing,which existing the following problems:Firstly,rule-based methods require building large rule base.While adding hard rules,there will be conflicting situations between the rules,which can greatly reduce error correction efficiency and accuracy.Secondly,the N-gram method does not address the problem of long distance dependent and data sparsing.The N-gram model can only describe the local connection in the sentence,and when the content of the sentence is longer than the N-gram length,the error correction algorithm loses the ability.On the other hand,while N-gram is long enough to solve the problem of long distance,the sparse matrix problem will also invalidate the algorithm.Finally,the parsing-based method cannot effectively correct local errors.In the case of certain local associations determining the usage of words,this method will ignore this connection.Aiming to solve these problems,this paper proposed an algorithm to grammatical error correction based on parsing and N-gram model.Here are major works:First,long sentences are divided into multiple clauses by the technique of dependency parsing.And then the probability of each of these clauses are got by N-gram model.In the end,the probability of each of these clauses is compounded to the probability of the long sentence.Secondly,combining LeftBigram,RightBigram,and Trigram to establish the N-gram model of the clauses.Finally,the error candidate set and the N-gram scoring method are adopted.The strategy is to calculate every wrong candidate instance of N-gram frequency in the corpus,the score of the instance is obtained by weighted sum frequency,finally get the highest one in the set.The experimental results show that the method based on parsing and N-gram model GEC is feasible and effective.
Keywords/Search Tags:Grammatical Error Correction, N-gram model, parsing
PDF Full Text Request
Related items