Font Size: a A A

Research On Chinese Text Proofreading Method Based On Deep Learning

Posted on:2022-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:X L BaiFull Text:PDF
GTID:2518306743970459Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Text proofreading has always existed in people's daily life.With the rapid increase of the trend of text electronic,it is more urgent to improve the performance of text automatic proofreading model.According to its research history,this paper analyzes and summarizes the methods of Chinese text proofreading,and finds that there are many problems in the traditional methods of Chinese text proofreading.How to effectively apply deep learning technologies to the field of Chinese text proofreading is a challenging and meaningful task.After consulting the literature and testing the deep learning models used for Chinese text proofreading tasks in recent years,Spell GCN was selected as the baseline model for this study according to the evaluation results.In the research process,according to the problems encountered in the field of Chinese text proofreading,such as less labeled data required,catastrophic forgetting in incremental learning,and the model cannot well understand the semantics of words and contextual semantic relationships,this paper proposes data enhancement,incremental The direction of improvement in learning,semantic proofreading,optimization algorithm,etc.,aims to improve the performance of existing Chinese text proofreading models and ensure the reliability of proofreading.The main work is as follows:(1)Analyze the ratio of pronunciation error and shape error in the text,and use the data enhancement method to increase the noise to enhance the existing Chinese text proofreading data.Its core is to increase the scale and diversity of existing datasets by means of noise replacement,thereby enhancing the generalization ability of the model.(2)Research the effect of incremental training of Chinese text proofreading model,and propose an incremental learning training method based on playback mechanism during the research process,which effectively alleviates the problem of catastrophic forgetting in the process of training the model.(3)Study how to introduce semantic knowledge in the proofreading process,so that the model can identify the wrong use of words and entities in the text,and then carry out Chinese semantic knowledge proofreading.In this paper,a related semantic knowledge proofreading dataset is constructed as a knowledge driver,which further enhances the model's ability to learn semantic relationships between contexts.(4)Combined with the detection network of soft-masked BERT algorithm,the Spell GCN algorithm is improved and optimized,and the soft-masked Spell GCN algorithm is proposed.It is verified by experiments that the evaluation results of the Soft-Masked Spell GCN model are better than the Soft-Masked BERT model and the Spell GCN model under the same training data and training environment.In this paper,knowledge-driven and deep learning methods are innovatively used for semantic knowledge proofreading,and soft-masked Spell GCN algorithm is proposed,which effectively improves the overall performance of spelling proofreading and semantic knowledge proofreading models.
Keywords/Search Tags:Chinese Text Proofreading, Deep Learning, Data Enhancement, Incremental Learning, Semantic Proofreading
PDF Full Text Request
Related items