Font Size: a A A

Research And Implementation Of Error Correction Based On Pre-Trained Language Model

Posted on:2021-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:P X ZhuFull Text:PDF
GTID:2555306914963149Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The goal of the task is to use computer programs to detect and correct grammatical errors in English texts.The current method is to take it as a machine translation task,that is,to "translate" sentences with grammatical errors into correct sentences.However,the current methods have the following problems:(1)the effect of machine translation based methods depends on a large number of annotated grammatical error corpus,while the existing corpus is small in scale and time-consuming to collect and sort out.(2)The detection and correction of a sentence in a text depends not only on the internal information of the sentence,but also on the context information of the sentence.However,most of the current methods ignore the function of the information,so it is impossible to detect and correct the grammatical errors that depend on the context.The main work of this paper includes:(1)In order to alleviate the influence of the small scale of the corpus,this paper,inspired by the achievements of the pre-trained language model in text classification and other tasks,adds the semantic representation learned by the pre-trained language model into the GEC task according to the fusion method proposed by Zhu et al.(2019)and proposes an English grammar detection and correction method based on the pre-trained language model.(2)In order to further alleviate the impact of the small scale of the corpus,this paper use the back-translation to generate artificial noise data to expand the annotated corpus.In order to prevent the artificial generated corpus from interfering with the real corpus,we use the artificial generated corpus to pre-train the detection and correction model,use the real corpus to fine tune the model,and proposes an English grammar detection and correction method based on the artificial noise data.(3)In order to obtain the context information of the sentence,a text level encoder is added and integrated into the detection and correction model.At the same time,in order to control the influence of context information on the target sentence,this paper adds attention layer and control gate to the decoder,and proposes an English grammar error detection and correction method based on context information.(4)Design and implement an English grammar error detection and correction system integrating language model and context information.In terms of experiments,this paper uses the public evaluation corpus,CoNLL2014 and JFLEG,to evaluate the result of our method,and also compares it with state-of-the-art methods.Our method based on the pre-trained language model achieves precision of 63.09%,recall of 39.45%and F0.5 of 56.34%in CoNLL2014 and GLEU of 59.46%in JFLEG.After the fusion of artificial noise data and the document-level encoder,the F0.5 and GLEU values reach 61.19%and 61.22%respectively,which are 4.12%higher on F0.5 than those results proposed by Ge et al.(2018).
Keywords/Search Tags:grammatical error correction, seq2seq, pre-trained language model, back translation, document-level encoder
PDF Full Text Request
Related items