Font Size: a A A

Research And Application Of Key Techniques In Chinese Text Proofreading

Posted on:2020-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:C WuFull Text:PDF
GTID:2428330596975469Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of microcomputer technologies,the speed of information generation is increasing rapidly.Nowadays,almost all of paperwork is done on computers.However,as long as text information is involved,it is inevitable that there will be text errors.The traditional proofreading technologies is difficult to meet the increasing demand for text proofreading.The research of automatic text proofreading technology is urgent.The Chinese text automatic proofreading technology can correct text errors encountered in various fields,including government-issued announcements and news,academic papers and research reports submitted by scholars,as well as Chinese input method and speech recognition.So studying automatic text proofreading has a wide range of application fields and practical value.Based on an in-depth study of domestic and international text proofreading research,the following three aspects is included in this paper:1.Chinese spelling error proofreading.Based on the analysis of the advantages and disadvantages of the algorithms studied by previous researchers,a k-shortest path fuzzy word segmentation algorithm based on LSTM and N-gram is proposed.The algorithm is divided into three stages:Firstly,through a fuzzy matching algorithm,the string in the sentence is fuzzy matched in the dictionary,and the possible correction candidate words are obtained to form a word lattice.Then,the Bigram language model is used to find the k-best sentences.Finally,Trigram and LSTM language model is used to reorder the k sentences to obtain the final correction result.On the SIGHAN2013 dataset,the algorithm outperformed other proofreading systems.2.Chinese grammar error proofreading.According to the characteristics of Chinese grammar error proofreading task and the existing problems of Chinese grammar proofreading method,a grammar proofreading method based on language model and neural machine translation is proposed.The core of the algorithm is the convolution sequence-to-sequence?Conv-seq2seq?model.In the training process of the model,a wrong sentence generation model is trained on a seed corpus of correct-error sentence pairs,to produce more wrong sentences,and increasing the scale of the training corpus.Inspired by the method that has significant effects in the translation of small language languages,the model is initialized by the parameters of the pre-trained English-Chinese translation model to improve the performance of the model.In the application process of the model,the spelling errors of the sentences will be corrected first,and then input into the model,and finally the results of the beam search will be reordered by the language model.to select sentences that are more in line with the Chinese language features.On the NLPCC2018 dataset,the algorithm has surpassed other systems on F0.5.3.Using the lightweight web framework Flask as the main body,the Chinese text proofreading test system based on B/S architecture is designed and implemented.The system includes four modules:knowledge acquisition module,front-end interaction module,pre-processing module and automatic proofreading module.word-level error proofreading,grammatical error proofreading,punctuation and digital proofreading of Chinese text are implemented.
Keywords/Search Tags:Chinese text proofreading, language model, neural machine translation
PDF Full Text Request
Related items