Font Size: a A A

Design And Implementation Of Online Intelligent Text Proofreading System

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2518306575455604Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Chinese spelling correction is an important and challenging task in the field of natural language processing,and there are problems such as no obvious word delimiters,many types of spelling errors,and a lack of publicly available high-quality corpus of Chinese spelling corrections.To address these problems,an online intelligent text proofreading system is designed and built.The key to the Chinese text spelling correction task is to understand the contextual semantic environment.The traditional BERT-based sequence annotation scheme has achieved good results on the current task,but the scheme is limited by the fact that the BERT pre-training task only learns from 15% of the tokens,and often chooses not to correct ambiguous spelling errors.To address the shortcomings of the scheme,the online intelligent text proofreading system uses a new two-stage neural network structure,which consists of an error checking network and an error correction network.The error checking network identifies different types of spelling errors from all tokens at the word level,while the error correction network corrects the identified spelling errors.Meanwhile,the word-level based error checking can solve the two problems of no obvious word delimiters and many types of spelling errors.Secondly,to solve the problem of lacking a Chinese spelling error corpus,the online intelligent text proofreading system uses word frequency distribution sampling and EDA data enhancement techniques to construct a corpus with spelling errors.The word-frequency distribution sampling and correction technique samples from the correct corpus based on the word-frequency distribution and corrects the sampled words,while the EDA data enhancement technique replaces the corpus with misspellings with the same meaning.Both techniques are capable of constructing a large number of corpus with misspellings.Moreover,the system's data collection mechanism will accumulate spelling errors in real-life situations,thus solving the problem of lacking a Chinese spelling correction corpus.In addition,considering the real-time requirements of the system and the limitation of computing resources,the model distillation technique is used for the neural networks in the system to reduce the number of neural network parameters and memory occupation,and to improve the training efficiency and inference speed.A web-based technology is used to develop an online intelligent text proofreading system,and the algorithms used in the spell correction function of the system are compared.Compared with the BERT-based spelling correction scheme,the proposed two-stage neural network improves the error correction rate by 0.7 and 1.7 percentage points on the SIGHAN and Wikipedia Chinese spelling correction test sets,respectively,and the error detection rate reaches 81.2%,which can provide spelling correction services for text writers.In addition,the Chinese spelling data accumulated in the course of using the system will be made available to the public,and we hope to contribute to the Chinese spelling correction field.
Keywords/Search Tags:Chinese Text Spelling Correction, Online Intelligent Text Proofreading System, Two-stage neural network, Model Distillation
PDF Full Text Request
Related items