Research And Implementation Of Grammar Error Correction Model Based On Deep Learning

Posted on:2021-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Gao

Full Text:PDF

GTID:2428330620464042

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development and popularization of Internet technology,the number of electronic texts is increasing.The explosive growth of electronic text leads to the decline of text quality,while it is obviously unrealistic to conduct a manual review and assessment.Therefore,the task of grammatical error correction has attracted more and more attention in recent years.Machine translation technology,benefiting from the rapid development of deep learning has made a series of breakthroughs,and makes sequence to sequence network widely used in grammatical error correction tasks.This thesis designed a method,which combines statistical machine translation with neural machine translation.The main contributions are as follows:First,preprocess the training corpus.Preprocessing the NLPCC 2018 Chinese Grammatical Error Correction?CGEC?shared task training data set for training grammatical error correction model.Preprocessing the Chinese Wikipedia corpus.We extract text from text compression packages,which is used to train Chinese word vector and N-Gram language models via drawing on Wikipedia extractor tool and gensim wiki corpus library.The HSK dynamic composition corpus is preprocessed,and the data is expanded to train data set.Preprocessing SIGHAN 2013 CSC corpus for spelling correction model.Second,this thesis combines statistical learning with deep learning,and its N-Gram language model is used to solve Chinese spelling errors.Firstly,the model is used to score words in sentences after training.The position with low score is regarded as the position to be corrected.Construct candidate sets based on SIGHAN 2013 CSC and select the sentences with the most perplexity.Third,this thesis uses the deep learning model Seq2Seq_Attention model and Transformer model to eliminate deep-level errors,and improves the model performance through data cleaning,data amplification,sub-word level modeling,curriculum learning strategies,and masked sequence-to-sequence strategies.Finally,We use a model integration method to send the output of each model to the N-Gram language model for scoring,and select the highest score as the final output.Finally,the NLPCC 2018 official benchmark set is used to test the model designed in this thesis,and the experiments show that the methods adopted improve the model performance.Among them,the performance of model integration method is the best,and its F_0.5.5 value is improved from 21.16 to 26.14,which is 4.98 percentage points higher than that of computational linguistics research center of Peking University,which proves that the model proposed in this paper is effective.

Keywords/Search Tags:

Chinese grammatical error correction, N-Gram language model, Seq2Seq_Attention network, Transformer network, model ensemble

PDF Full Text Request

Related items

1	Research On Language Model Rescoring And Error Correction Of Transcription Results In Chinese Speech Transcription
2	Grammatical Error Correction Based On N-Gram Model And Parsing
3	Research On Word Error Correction Methods Of Chinese Text
4	Research And Implementation Of Error Detection And Error Correction Efficiency Optimization Of Chinese Text
5	Research On Chinese Grammatical Error Correction Based On Sequence-to-Sequence Model
6	Research And Implementation Of Grammatical Error Correction Based On Recurrent Neural Network
7	Research On Statistical Analysis And Automatic Recognition Of Grammatical Errors In Modern Chinese
8	Research On Key Technologies Of Grammatical Error Diagnosis And Argument Reasoning In Chinese Auxiliary Learning
9	OCR Error Post-correction Based On Chinese Character-level Features And Language Model
10	The Optimization And Implementation Of The Efficiency And Performance Of Chinese Language Model Based On Recurrent Neural Network