Deep Learning Based Automatic Grammer Error Correction

Posted on:2020-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Huang

Full Text:PDF

GTID:2428330599451436

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Automatic grammar correction(GEC)is one of the most difficult tasks in syntactic analysis in natural language processing.In daily conversations,grammatical nuances are the most difficult to grasp and understand for a non-native speaker.The grammatical corrections in current natural language include not only grammatical errors,but also spelling and collocation errors.In recent years,with the development of deep learning,the task of automatic grammar correction has received a lot of attention.The phrase-related method based on statistical machine translation(SMT)is to regard GEC as a translation task: from "bad" to "good",the corpus used is parallel corpus similar to translation corpus.Different from SMT,which relies on recurrent neural network(RNN),there are also convolutional neural networks(CNN)for sentence coding and extraction of phrase-based semantic spatial representation.These methods locate grammatical errors by establishing an encoder-decoder sequence-to-sequence(seq2seq)model that understands the semantics between erroneous sentences and correct sentences and the differences in word expressions.In order to further fully learn the knowledge in the data,supervised learning is the most common.This method requires a lot of annotation data,but the cost of labeling is huge.Scholars have found that unsupervised learning can be performed using unlabeled data to help other supervisory tasks understand by mining valuable semantic information.Among them are pre-training models based on translation corpus,pre-training using long text corpus,and generalized pre-training model using multi-task.These pre-training models have been tested on many tasks and can greatly improve the performance of the model.Although the automatic error correction model can be based on a relatively new model architecture,the automatic error correction and the automatic error correction model with practical application value are still not ideal due to the lack of automatic error correction corpus.This study not only proposes a new stacked model structure,but also embeds the features of pre-trained rich semantic information to obtain a multi-layer automatic error correction model that can adapt to multiple pre-training methods.The model can not only solve the problem of correcting errors by multiple rounds of iteration,but also use the dual learning method to generate more additional training data in order to further alleviate the lack of automatic error correction corpus.The overall error correction framework can not only help to understand the correlation between words,the coherence of phrases,the matching of semantics,but also the accuracy of sentence grammar.The staged model structure makes the module highly replaceable and expandable.At the same time,the open source parallel error correction corpus and the actual error correction examples show that the model can not only achieve good results in the academic data set but also apply to the actual scene.The model framework of this paper can further integrate the current pre-training model weights,which is highly scalable,which is not available in all other work.Make this study more meaningful and future research value.

Keywords/Search Tags:

Natural language processing, Automatic grammar correction, Parallel corpus, Deep learning, Pre-training

PDF Full Text Request

Related items

1	Learning probabilistic lexicalized grammars for natural language processing
2	Maximizing resources for corpus-based natural language processing
3	Research On Chinese Grammar Error Correction Based On Deep Learning
4	Restricted Natural Language Query Interface Based On Semantic Dependence Grammar Analysis Model
5	Research And Application Of Chinese Word Segmentation Based On English-Chinese Parallel Corpus
6	Research On Part-of-Speech Tagging Algorithms Of Mathematical Corpus Based On Deep Learning
7	CCG Supertagging Based On Deep Learning Models
8	Research On Machine Learning For Natural Language Processing And Transmission
9	Evaluating grammar formalisms for applications to natural language processing and biological sequence analysis
10	The Methodology And Implementation Of Chinese Natural Language Query In Databases