Font Size: a A A

Research On Proofreading Method Of Chinese Phonics Based On Machine Translation Model

Posted on:2018-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:X XueFull Text:PDF
GTID:2358330515478802Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the frequent use of the network,massive user-generated texts are generated,and the user-generated text is different from the regular news text,which may produce a large number of non-standard forms such as misspellings.Processing non-standard text can not only make the text smooth and easy to read,but also for the follow-up text processing work to lay a good theoretical foundation and application base.Now most people are using pinyin input method for spelling,which brings a lot of homophony misspellings,so this thesis will explore Chinese homophony misspellings in depth.In this thesis,two methods of statistical machine translation and machine translation of neural network are used to study homophony misspellings,specifically,this thesis will study from the following three aspects:(1)Homophony misspelling corpus structure based on Chinese-pinyin-Chinese:The user generates a large number of homophony misspellings,but it costs not only a lot of time,but also a lot of manpower to find all these misspellings,so this thesis uses the method of automatically constructing the homophony misspelling corpus to solve this problem.This thesis mainly simulate the process that right character types wrong one in order to construct possible misspellings,by the method which right-pinyin-indistinct pinyin-wrong,joining the neural network maximum entropy model to filter.(2)Homophony misspellings correction based on the statistical machine translation model:In this thesis,the word is changed from the wrong word to the original word as the translation from the wrong word into the orthography.This chapter uses contrary process of the construction of homophony misspellings corpus in the first part as the parallel corpus of statistical machine translation model,and then generate the language model and translation model to correct homophony misspellings.The experimental results show that it's useful for the statistical machine translation model to correct the homophony misspellings,and it's better for the homophony misspelling correction to add indistinct pinyin and the neural network maximum entropy.(3)Homophony misspellings correction based on machine translation of neural network:In this chapter,we use two strategies to correct homophony misspellings.The first is to use neural network machine translation model which is LSTM to correct homophony misspellings,and the second is to use the statistical machine translation model to get the n-best of the wrong word,then choose the best one from n-best by RNN model and RNN combined with N-gram model.The experimental results show that simply using neural network machine translation cannot be applied directly to deal with the problem of homophony misspellings correction.And the method that RNN combined with N-gram model is used to choose the best one from n-best is better.
Keywords/Search Tags:homophony misspellings correction, statistical machine translation model, recurrent neural network model, the neural network maximum entropy
PDF Full Text Request
Related items