Font Size: a A A

Research On The Proofreading Method Of Chinese Typos Based On Sequence Labeling Mode

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z J HanFull Text:PDF
GTID:2438330602497941Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the come of new-media era,the Internet environment is full of massive text generated by the Internet users,which may contain a large number of irregular expression.With the increasing of China's comprehensive national strength,the number of Chinese-as-a-Foreign-Language learners is also increasing day by day.Beginners may have a variety of typos in the process of learning Chinese.In this paper,We explore the task on Chinese spelling check,and the method of automatic detection and correction of Chinese typos using neural network technology.This paper contains three aspects as following: Chinese spelling check based on neural machine translation,Chinese spelling check based on sequence labeling model,and Chinese spelling check integrating Pinyin and stroke features.Chinese spelling check based on neural machine translation.In this paper,Chinese typography proofreading is regarded as the problem of machine translation from typography sequence to orthography sequence.Based on the model of neural network machine translation,we apply Transformer and Copy Net network to improve the performance of Chinese typos detection and correction.Through further exploration,we can improve the performance of the model by using different pre-training mechanisms of word embedding.Chinese spelling check based on sequence labeling model.Because the model based on machine translation relies heavily on the scale of the corpus,and it is difficult to get the wrong typo corpus,we use the sequence labeling model to detect and correct the wrong typo,which is able to reduces the complexity of the algorithms.Using Bi-LSTM and Bi-GRU network,the semantic information of input sequence is obtained from positive and negative order,and the Chinese characters in input sequence are predicted one by one by decoding with the Conditional Random Field layer and the Softmax layer.The final experimental results show that the performance of Chinese spelling check of the model based on sequence labeling is better than the model based on neural machine translation.Chinese spelling check integrating Pinyin and stroke features.Typos consists of shape-like typos and phoneme-like typos.In view of the causes of these two types of typos,we use the phoneme and structural information in Chinese characters,integrate the Pinyin and stroke features on the basis of the sequence labeling model,and join the word vectors of Chinese characters through convolution neural network in the way of character vectors.The experimental results show that the performance of the model with external features is better.
Keywords/Search Tags:Chinese Spelling Check, Neural Machine Translation, Sequence Labeling Model, Recurrent Neural Network
PDF Full Text Request
Related items