Font Size: a A A

Research Of Pinyin Input Method For Non-Chinese Native Chinese Learners

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:J YaoFull Text:PDF
GTID:2428330605482463Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid improvement of China's economic strength and the widespread of Chinese culture,Chinese learners who are not native Chinese speakers are increasing.The difficulties in learning Chinese are mainly reflected in the following three aspects:1)Inaccurate grasp for tone.It is difficult for learners to learn tones in conjunction with their mother tongue,and it is difficult to master the tones and tones' transpositions in Chinese communication;2)Word with a pronunciation but several forms.Words are difficult to recognize due to the existence of a large number of homophones in Chinese and intricate structure;3)Vocabulary with richness and variety.The large number of Chinese vocabulary and flexible usage lead to errors in usage.Learners will inevitably need to input Chinese in their mobile phones during the learning process.Chinese input method has become an indispensable input tool.However,the existing Pinyin input methods are designed for people whose mother tongue is Chinese,and it does not take into account that when learners use the pinyin input method,there will be confusion about the selection of many candidates.Based on the existing Pinyin input methods,this article researches and implements a Pinyin input method for Chinese learners.We call it SeeIME input method.This paper main contributions are as follows:(1)In response to the confusion encountered by learners using the Pinyin input method,we redesigned the Pinyin input method.First of all,it is proposed that the first two characters of the English translation of Chinese characters can still be input after inputting Chinese pinyin.This input method can greatly reduce the number of candidates and improve the performance of input method Pinyin-to-Character conversion.And the highest accuracy rate in the test set reached 96.19%.Then,a three-stage candidate column is proposed to improve the legibility of the candidates,in addition to displaying Chinese characters,the corresponding pinyin with tone and English translation are also displayed.Finally,it is proposed to combine the input method with a dictionary,in which the input method can help users enhance their understanding of Chinese characters.(2)We observe the impact of the n-gram language model under different smoothing algorithms on the pinyin-to-character conversion of the input method,and propose a new smoothing algorithm.It improves the comprehensive performance in the input method by combining Kneser-Ney and Modified Kneser-Ney smoothing algorithms.In order to solve the long-distance dependency problem encountered by the n-gram language model,this paper uses the GPT-2 language model for the first time in the input method,which is used to solve the long-distance dependency problem to improve the performance of input method pinyin-to-character conversion further.Under the two test sets,the performance of input method pinyin-to-character conversion improved by 3%and 1%,respectively.(3)In addition to high-quality pinyin-to-character conversion performance,input methods can also add auxiliary functions to help users enter.This article attempts to combine a chat machine with an input method,and uses the input pinyin information to make the chat machine generate sentences that meet the user's requirements,which is called pinyin chat machine(PCM).Based on the Seq2Seq model,two schemes are proposed to incorporate pinyin information,namely simple fusion and fusion with reading gates.Through experimental analysis,PCM has good adaptability to pinyin and the feasibility of generating corresponding reply content based on pinyin.
Keywords/Search Tags:pinyin input method, language model, smoothing algorithm, GPT-2, pinyin chat machine
PDF Full Text Request
Related items