Research On Chinese Grammatical Error Correction Based On Sequence-to-Sequence Model

Posted on:2022-08-01

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Qiu

Full Text:PDF

GTID:2518306563479044

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Grammatical error correction(GEC)is an important task in natural language processing,which aims to detect and correct grammatical errors in text.With the development of deep learning and the explosive growth of data,translation mode has become the primary choice for GEC task,and neural sequence-to-sequence(seq2seq)model has been widely used in GEC task.Compared with alphabetic languages such as English,Chinese has many distinct characteristics.Moreover,there are fewer relevant data sets for Chinese grammatical error correction,which limits the learning ability of seq2 seq model.In order to solve the above problems,this paper conducts a further study on the task of Chinese GEC on the basis of existing research.The main work of this paper is as follows:(1)A two-stage model for Chinese GEC(TS-GEC)is proposed.The model consists of two independent sub-modules: a spelling check sub-module based on language model and a GEC sub-module based on seq2 seq model.The spelling check sub-module is responsible for correcting spelling errors in the given text,mainly non word errors;while the GEC sub-module based on seq2 seq model is responsible for correcting other grammatical errors in a given text,including grammatical errors and spelling errors.According to the characteristic that the source sentence and the target sentence in GEC task are the same language,recycle inference method based on language model is proposed based on seq2 seq model,which perfectly corrects multiple grammatical errors contained in the text through multi-round inferences.At the same time,different initialization methods are adopted for the embedding layer of the seq2 seq model.The pretrained word embedding is used to initialize the embedding layer of the decoder,and the encoder is initialized randomly.This method can ensure that the word vector learned by the encoder is more consistent with the characteristics of syntax error sentences,and has better representation ability.(2)A Chinese GEC model based on dynamic masking words(DMasking GEC)is proposed.The model is based on the transformer model,and dynamic masking words algorithm is introduced in the model input stage,including four basic masking methods:random masking,random substitution,unk substitution and reorder.In the training stage of model,a group of masking methods are randomly selected from four different masking methods to add noise data to the source sequence,and the data set is modified in a small range to obtain more diverse training samples with grammatical errors.To a certain extent,the dynamic masking words algorithm alleviates the problems of less training samples and error categories in Chinese GEC task.(3)Experiments are conducted on the NLPCC 2018 GEC public data set.The TS-GEC model and DMasking GEC model proposed in this paper reach 31.01 and 33.71 on the F0.5 score,respectively,which exceed the optimal results of NLPCC 2018 GEC task(F0.5 = 29.91)1.1 and 3.8 respectively.The experimental results prove the effectiveness of the proposed model for Chinese GEC task.

Keywords/Search Tags:

Chinese grammatical error correction, sequence-to-sequence model, recycle inference, dynamic masking words

PDF Full Text Request

Related items

1	Research And Implementation Of Grammatical Error Correction Based On Recurrent Neural Network
2	Automatic Grammatical Error Detection Technology And Application For Chinese Text
3	Research And Implementation Of Grammar Error Correction Model Based On Deep Learning
4	Research On Word Error Correction Methods Of Chinese Text
5	Research On Statistical Analysis And Automatic Recognition Of Grammatical Errors In Modern Chinese
6	Research And Implementation Of Chinese Error Correction Method Based On Search Engine
7	Research On Key Techniques Of Chinese Grammar Error Correction Based On Neural Network
8	Research On Key Technologies Of Grammatical Error Diagnosis And Argument Reasoning In Chinese Auxiliary Learning
9	Improvement And Research Of Sequence-to-Sequence Model For Chinese Text Summarization
10	Research On Chinese Abstractive Text Summarization Based On Sequence To Sequence Model