Font Size: a A A

Research On Chinese Grammar Error Correction Based On Deep Learning

Posted on:2022-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y FengFull Text:PDF
GTID:2518306746468754Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the Internet age,massive amounts of text data are generated all over the world at every moment,which is mixed with a lot of wrong information.If it is not proofread,these wrong data will have a great impact on the follow-up work.Conventional manual proofreading has been unable to keep up with the speed of today's text generation.With the development of deep learning and natural language processing technology,academia and industry have carried out text error correction research.Text errors can be divided into superficial and deep,spelling errors and punctuation errors belong to the former,and grammatical errors belong to the latter.Shallow errors can be corrected by rules and language models,while traditional machine learning-based correction methods are unsatisfactory in the face of deep errors.It can be seen that deep grammatical error correction is the core and difficulty of current text error correction technology.For this reason,the current main direction of text error correction research is based on deep learning,using neural network models to train large-scale grammatical error correction tasks.This paper first introduces and summarizes the research status of text error correction,and then proposes a feasible Chinese grammar error correction method for grammar error correction based on the existing deep learning-based automatic error correction methods for Chinese texts.The main work of this paper is as follows:(1)The research background and significance of text automatic error correction technology are expounded,the research progress of Chinese and English text error correction is analyzed and summarized,and the related work is introduced.(2)For Chinese grammar error correction,an error correction model based on the UniLM model framework is proposed,which uses words as the granularity,initializes the model with pre-trained model parameters,and fine-tunes the model under specific corpus training.(3)Build a framework based on the UniLM+CRF model to realize the task of marking grammar errors in Chinese texts,and mark possible grammar errors in the text.(4)Build a seq2 seq model framework based on UniLM,realize the task of Chinese text grammar error correction,and correct possible grammar errors in the text.(5)Preprocess the corpus data.This paper uses the public data set provided by NLPCC 2018 shared task 2-Grammatical Error Correction(GEC)to clean,denoise,segment,remove stop words and other operations.Improve the quality of datasets and help improve model training accuracy.(6)Propose a method for generating grammatical error labeling task samples based on editing operation set,which provides label data for labeling task training samples.Among them,grammatical errors are divided into four categories: S(replacement word),R(multiple word),M(less word),W(out of order),and grammatical errors in the text are marked according to the category.(7)Analyze and summarize the experimental results,and use Precision,Recall,and F-value indicators to evaluate the performance of the model.This paper uses the public scorer to calculate the corresponding evaluation indicators.
Keywords/Search Tags:Grammar Error Correction, Transformer, UniLM, BERT, Attention Mechanism, pre-training
PDF Full Text Request
Related items