Font Size: a A A

Research In Grammatical Error Correction Based On Sequence Generation Model

Posted on:2021-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:C C WangFull Text:PDF
GTID:2505306470967459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Grammatical error correction(GEC)is an important task of Natural Language Processing(NLP),which has high research value and application prospect.With the development of artificial intelligence,more and more tasks have achieved good results.Furthermore,the research in the field of GEC has made significant advance with the optimization of deep learning algorithm and the update of hardware.In many grammar error correction shared tasks,researchers constantly optimize new methods and achieve good results.However,there are still a lot of defects in the data aspect.Most of the researches rely heavily on the public datasets.Due to the scarcity of data,the error correction model cannot break through the bottleneck of performance.In order to solve the problems of data scarcity and poor error correction performance,this thesis studies the model optimization method and data augmentation strategies by exploring the characteristics of GEC corpora.The ultimate goal is to effectively improve the error correction performance.The main research contents include:(1)Grammatical error correction model based on sequence to sequence architecture.In this work,aiming at the shortcomings of the existing model structure,the seq2seq architecture which can solve the problem of variable length text generation is introduced.Then,a Markov chain algorithm for text sequence generation is proposed.Through ablation analysis and performance evaluation experiments,the important role of each module of the model in error correction system is verified.(2)Grammatical error correction model based on the enhancement of multi-head attention.In this work,the LSTM neural unit is replaced by multiple attention modules,which improves the parallel operation efficiency of the grammatical error correction model.Furthermore,the dynamic residual structure is proposed,which solves the gradient disappearance problem caused by the depth of the model,and improves the performance of the seq2seq error correction model.Finally,the evaluation results show that the method achieves the optimal performance in Chinese grammatical error correction data.(3)Data augmentation method for grammatical error correction task.In this part,two customized data enhancement strategies are designed according to the characteristics of grammatical error correction corpora.One is the rule-based method of corpus corruption,which uses three operations to synthesize pseudo parallel corpus.The second is data synthesis method based on back-translation mechanism.This method injects error noise into monolingual data by training error generation model with reverse data to enlarge the data scale.(4)Grammatical error detection model based on Graph Neural Network(GNN).This work constructs the basic framework of error detection model from two aspects:the characteristics of sequence labeling and the advantages of graph network modeling.The method,a dependency tree modeling method for error sentences using graph network,proposed in this section provides important information features for error detection and effectively improves the performance of error detection.
Keywords/Search Tags:Grammatical Error Correction, Error Detection, Sequence to Sequence, Data Augmentation, Natural Language Processing
PDF Full Text Request
Related items