Font Size: a A A

Research And Implementation Of Pinyin Input Method Based On Language Model

Posted on:2020-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:L DaiFull Text:PDF
GTID:2518306104995479Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Pinyin input method is one of the most commonly used software in people's daily life.The user inputs a series of alphabetic sequences.The input method engine predicts the Chinese character string that the user expects to input according to the trained model or based on the rules.Accurate and efficient Pinyin input method is of great significance to improve people's life,study and work efficiency.The development of the deep learning field has directly set off the third wave of artificial intelligence,and has made breakthroughs in the field of NLP.In order to develop a more intelligent,accurate and efficient Pinyin input method,the neural network language model is applied to the decoding module of Pinyin input method.First,the process of converting Pinyin to Chinese characters can be regarded as a sequence labeling problem.Hidden Markov model is used to complete the decoding of Pinyin to Chinese characters,that is,to label the corresponding Chinese characters for each syllable.In order to overcome the shortcomings of Hidden Markov Models in language modeling,the language model was then introduced to the Pinyin Chinese Character decoding module,that is,the language model is used to re-evaluate several Chinese character candidates decoded by Hidden Markov,and rescore the results.Based on the above ideas,HMM + N-gram,HMM + RNN,HMM + LSTM,HMM + BERT phonetic word conversion models are trained.Finally based on the trained model,the design,implementation and testing of Pinyin input method were done.In the implementation of Pinyin input method,the Pinyin syllable segmentation adopts the dynamic programming segmentation method.Dictionary generation uses the concept of entropy to measure the rationality of a word by the indexes of freedom and coagulation,and excavates nearly 150000 2-4 words.Top N word or words candidates were obtained by matching the rules,Top N candidate whole sentence was obtained by HMM decoding plus neural network language model rescoring method.The finalized Pinyin input method has the functions of single word input,words input,whole sentence input,associative input,and dynamic frequency modulation.The Pinyin input method uses the trained language model to achieve fruitful conversion of long sentences,which fully proves the superiority of the neural network language mode.
Keywords/Search Tags:Input method, Hidden Markov, Language model, Deep learning
PDF Full Text Request
Related items