Font Size: a A A

Deep Learning Based Automatic Grammer Error Correction

Posted on:2020-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HuangFull Text:PDF
GTID:2428330599451436Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Automatic grammar correction(GEC)is one of the most difficult tasks in syntactic analysis in natural language processing.In daily conversations,grammatical nuances are the most difficult to grasp and understand for a non-native speaker.The grammatical corrections in current natural language include not only grammatical errors,but also spelling and collocation errors.In recent years,with the development of deep learning,the task of automatic grammar correction has received a lot of attention.The phrase-related method based on statistical machine translation(SMT)is to regard GEC as a translation task: from "bad" to "good",the corpus used is parallel corpus similar to translation corpus.Different from SMT,which relies on recurrent neural network(RNN),there are also convolutional neural networks(CNN)for sentence coding and extraction of phrase-based semantic spatial representation.These methods locate grammatical errors by establishing an encoder-decoder sequence-to-sequence(seq2seq)model that understands the semantics between erroneous sentences and correct sentences and the differences in word expressions.In order to further fully learn the knowledge in the data,supervised learning is the most common.This method requires a lot of annotation data,but the cost of labeling is huge.Scholars have found that unsupervised learning can be performed using unlabeled data to help other supervisory tasks understand by mining valuable semantic information.Among them are pre-training models based on translation corpus,pre-training using long text corpus,and generalized pre-training model using multi-task.These pre-training models have been tested on many tasks and can greatly improve the performance of the model.Although the automatic error correction model can be based on a relatively new model architecture,the automatic error correction and the automatic error correction model with practical application value are still not ideal due to the lack of automatic error correction corpus.This study not only proposes a new stacked model structure,but also embeds the features of pre-trained rich semantic information to obtain a multi-layer automatic error correction model that can adapt to multiple pre-training methods.The model can not only solve the problem of correcting errors by multiple rounds of iteration,but also use the dual learning method to generate more additional training data in order to further alleviate the lack of automatic error correction corpus.The overall error correction framework can not only help to understand the correlation between words,the coherence of phrases,the matching of semantics,but also the accuracy of sentence grammar.The staged model structure makes the module highly replaceable and expandable.At the same time,the open source parallel error correction corpus and the actual error correction examples show that the model can not only achieve good results in the academic data set but also apply to the actual scene.The model framework of this paper can further integrate the current pre-training model weights,which is highly scalable,which is not available in all other work.Make this study more meaningful and future research value.
Keywords/Search Tags:Natural language processing, Automatic grammar correction, Parallel corpus, Deep learning, Pre-training
PDF Full Text Request
Related items