Font Size: a A A

Chinese Spelling Check Based On Pre-training Model

Posted on:2022-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:L R LiFull Text:PDF
GTID:2518306758492094Subject:Trade Economy
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet,more and more electronic texts have appeared in our lives.Typing through input methods to make text production has become an indispensable part of our daily life.It is the spelling error caused by misspelling and error,which has a great impact on the quality of the text.Therefore,it is of high practical importance to perform accurate spell checking on the generated text.Chinese spell checking is a task to detect and correct errors in Chinese text.The traditional text proofreading techniques are difficult to handle effectively in the face of the current large data volume.In recent years,the pre-trained language models have obtained excellent performance on different tasks.In this paper,two Chinese spell checking algorithms are proposed based on pre-trained models combined with graph convolutional neural networks as follows:(1)The Chinese Input Method Spell Check Model(CIMSC)based on Whole Word Masking is proposed.The CIMSC model is based on Bert-WWM,which expands the vocabulary and adds a large number of daily Chinese phrases,so that the model can vectorize the phrases.Aiming at the problem of spelling errors caused by the mixed use of full spelling and simplified spelling in the current Chinese input method,this paper proposes a custom input method error-prone word confusion rule.The input habit of initial consonant simple spelling and full spelling combination is analyzed,and on this basis,a phrase confusion set based on input method is constructed.Then,the constructed phrase confusion set is modeled by using a graph convolutional neural network,which fully integrates the relationship between the confused phrases,so that the spelling errors of the phrases can be fully corrected during spell checking.Experiments show that compared with the previous model,the CIMSC model has a greater improvement in the spelling check effect of Chinese input method text,and is more practical,which proves the rationality and effectiveness of the model.(2)The Chinese spelling check method(PSP)is proposed that fuses phonetic and phrase information.In the past,Chinese spelling check tasks were mainly modified for single-word errors.On the basis of considering the similarity of word-sounds confusion,this method adopts the error-prone word confusion rules of the input method proposed in this paper,and adds the consideration of phrase information during spelling.Experiments show that compared with the previous model that only considers the confusion of word pronunciation,the PSP method has improved the effect of spelling checking.When modifying,it not only considers the confusion information of word pronunciation,but also fully considers the similarity between phrases,which improves the efficiency of spell checking,which proves the rationality of the model design.The main innovation of this paper is to construct the word confusion set according to the rules of input method,and establish the connection relationship between words and words and between words and words accordingly to form the confusion graph of words and words respectively,and then use the graph convolutional network for feature extraction.The experimental results verify the effectiveness of the algorithm proposed in this paper and show that it has good practical value.
Keywords/Search Tags:Chinese Spell Check, Pre-training Model, Graph Convolutional Neural Network, Confusion Set
PDF Full Text Request
Related items