Font Size: a A A

Research And Implementation Of Grammar Error Correction Model Based On Deep Learning

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q GaoFull Text:PDF
GTID:2428330620464042Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development and popularization of Internet technology,the number of electronic texts is increasing.The explosive growth of electronic text leads to the decline of text quality,while it is obviously unrealistic to conduct a manual review and assessment.Therefore,the task of grammatical error correction has attracted more and more attention in recent years.Machine translation technology,benefiting from the rapid development of deep learning has made a series of breakthroughs,and makes sequence to sequence network widely used in grammatical error correction tasks.This thesis designed a method,which combines statistical machine translation with neural machine translation.The main contributions are as follows:First,preprocess the training corpus.Preprocessing the NLPCC 2018 Chinese Grammatical Error Correction?CGEC?shared task training data set for training grammatical error correction model.Preprocessing the Chinese Wikipedia corpus.We extract text from text compression packages,which is used to train Chinese word vector and N-Gram language models via drawing on Wikipedia extractor tool and gensim wiki corpus library.The HSK dynamic composition corpus is preprocessed,and the data is expanded to train data set.Preprocessing SIGHAN 2013 CSC corpus for spelling correction model.Second,this thesis combines statistical learning with deep learning,and its N-Gram language model is used to solve Chinese spelling errors.Firstly,the model is used to score words in sentences after training.The position with low score is regarded as the position to be corrected.Construct candidate sets based on SIGHAN 2013 CSC and select the sentences with the most perplexity.Third,this thesis uses the deep learning model Seq2SeqAttention model and Transformer model to eliminate deep-level errors,and improves the model performance through data cleaning,data amplification,sub-word level modeling,curriculum learning strategies,and masked sequence-to-sequence strategies.Finally,We use a model integration method to send the output of each model to the N-Gram language model for scoring,and select the highest score as the final output.Finally,the NLPCC 2018 official benchmark set is used to test the model designed in this thesis,and the experiments show that the methods adopted improve the model performance.Among them,the performance of model integration method is the best,and its F0.5.5 value is improved from 21.16 to 26.14,which is 4.98 percentage points higher than that of computational linguistics research center of Peking University,which proves that the model proposed in this paper is effective.
Keywords/Search Tags:Chinese grammatical error correction, N-Gram language model, Seq2Seq_Attention network, Transformer network, model ensemble
PDF Full Text Request
Related items