Research And Implementation Of Error Correction Based On Pre-Trained Language Model

Posted on:2021-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:P X Zhu

Full Text:PDF

GTID:2555306914963149

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The goal of the task is to use computer programs to detect and correct grammatical errors in English texts.The current method is to take it as a machine translation task,that is,to "translate" sentences with grammatical errors into correct sentences.However,the current methods have the following problems:(1)the effect of machine translation based methods depends on a large number of annotated grammatical error corpus,while the existing corpus is small in scale and time-consuming to collect and sort out.(2)The detection and correction of a sentence in a text depends not only on the internal information of the sentence,but also on the context information of the sentence.However,most of the current methods ignore the function of the information,so it is impossible to detect and correct the grammatical errors that depend on the context.The main work of this paper includes:(1)In order to alleviate the influence of the small scale of the corpus,this paper,inspired by the achievements of the pre-trained language model in text classification and other tasks,adds the semantic representation learned by the pre-trained language model into the GEC task according to the fusion method proposed by Zhu et al.(2019)and proposes an English grammar detection and correction method based on the pre-trained language model.(2)In order to further alleviate the impact of the small scale of the corpus,this paper use the back-translation to generate artificial noise data to expand the annotated corpus.In order to prevent the artificial generated corpus from interfering with the real corpus,we use the artificial generated corpus to pre-train the detection and correction model,use the real corpus to fine tune the model,and proposes an English grammar detection and correction method based on the artificial noise data.(3)In order to obtain the context information of the sentence,a text level encoder is added and integrated into the detection and correction model.At the same time,in order to control the influence of context information on the target sentence,this paper adds attention layer and control gate to the decoder,and proposes an English grammar error detection and correction method based on context information.(4)Design and implement an English grammar error detection and correction system integrating language model and context information.In terms of experiments,this paper uses the public evaluation corpus,CoNLL2014 and JFLEG,to evaluate the result of our method,and also compares it with state-of-the-art methods.Our method based on the pre-trained language model achieves precision of 63.09%,recall of 39.45%and F0.5 of 56.34%in CoNLL2014 and GLEU of 59.46%in JFLEG.After the fusion of artificial noise data and the document-level encoder,the F0.5 and GLEU values reach 61.19%and 61.22%respectively,which are 4.12%higher on F0.5 than those results proposed by Ge et al.(2018).

Keywords/Search Tags:

grammatical error correction, seq2seq, pre-trained language model, back translation, document-level encoder

PDF Full Text Request

Related items

1	Research On The Automated Grammatical Error Correction Model Of English Composition
2	Research On Neural Machine Translation Based English Grammatical Error Correction
3	Research On The Method Of Chinese Spelling And Grammar Error Correction
4	Design And Implementation Of An Automatic Grammar Error Correction Model For English Text
5	Research In Grammatical Error Correction Based On Sequence Generation Model
6	Analysis And Application Of Grammatical Error Correction Model For English Learner
7	Research And Implementation Of Grammatical Error Correction
8	Grammatical Error Correction Based On Sequence Generation
9	A Study Of Multi-level Error Analysis And Error Correction Methods In English Writing
10	Research On Chinese Grammatical Errors Diagnosis Based On Pipeline Model