Font Size: a A A

Research On Automatic Detection And Correction Of English Grammar Errors Based On Machine Translation

Posted on:2022-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:X D SunFull Text:PDF
GTID:2518306770967899Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As a sub-task in the field of natural language processing,English grammar error correction can provide second language learners with services such as automatic correction of grammar errors and article polishing.The current mainstream English grammar error correction methods are mainly based on data-driven or machine translation methods.The lack of relevant annotated corpus is one of the main reasons that affect the performance of such methods.Therefore,in the face of the lack of labeled corpus,how to take into account the quality and quantity of data to train the data augmentation model is particularly important.The use of pre-trained error correction models is also of great significance to improve the performance of grammar error correction models.The pre-training method of the model can usually use massive unlabeled data to improve downstream tasks by learning the context-sensitive representation of each word in the input sentence.This thesis mainly explores how to combine the advantages of different data augmentation methods to automatically generate training data for English grammar error correction tasks;secondly,to explore a machine translation-based English grammar error correction method,combined with the re-ranking strategy,to jointly improve the model's performance Error correction performance.The main research work is as follows:1.Design a novel machine translation-based data augmentation strategy.By analyzing the distribution of error types in the existing learner corpus,a confusion set with high contextual relevance is established for common error types.The monolingual corpus is subjected to noise processing by using the confusion set combined with artificial rules to obtain artificially synthesized data.Finally,the synthetic training data is combined with the learner corpus,which is used to train the error generation model based on machine translation.2.Design an English grammar error correction model based on machine translation,and use a new model optimization method to improve the model performance.Firstly,the training data is synthesized using the error generation model based on machine translation proposed in this thesis,and it is used for the training of the grammatical error correction model.Then,the grammatical error correction model is used to correct the source sentences in the learner's corpus,and the target sentences generated by the correction and the manually annotated standard sentences form "error-correct" sentence pairs,and the feedback is input to the error generation model for alternate training.By establishing the relationship between the grammatical error detection model and the grammatical error correction model,the error detection and correction ability of the model is improved.3.Design a grammatical error detection model as a tool for final result correction and optimization.By training the BERT-based English grammar error detection model,combined with the editing operation features and the output probability of the grammar error correction model,the candidate sentences output by the error correction system are re-scored,and the candidate sentence with the highest score is selected as the best result output.
Keywords/Search Tags:grammatical error correction, data augmentation, grammatical error detection, machine translation
PDF Full Text Request
Related items