Font Size: a A A

Deep Learning Based Chinese Pinyin Input Method

Posted on:2020-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y F HuangFull Text:PDF
GTID:2518306185999929Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chinese pinyin input method engine(IME)converts pinyin into character so that Chinese characters can be conveniently inputted into computer through common keyboard.IMEs work relying on its core component,pinyin-to-character conversion(P2C).In recent years,deep learning has been widely used in various natural language processing tasks.However,the research of applying neural networks to development of input method editor(IME)is almost blank.This paper mainly analyzes the feasibility of applying deep learning technology to the research of Pinyin input method,and proposed four methods to improve user experience of IME : neural P2 C conversion,online vocabulary updating,pre-training model and aided IME.We introduce the four methods in detail as follows:Using a neural P2 C conversion model based on sequence-to-sequence(Seq2Seq)framework for P2 C.P2C is the core component of pinyin-based IMEs.As we regard the pinyin sequence as a language,the P2 C can be naturally formulized into a machine translation task.Experiments show that this method can improve the quality of the P2 C conversion compared with the traditional method.Using some multi-granularity word embedding enhancement methods to augment the representation learning of P2 C.We proposed character-enhanced and subwordenhance embedding for the core task in IMEs.In addition,we proposed gated-attention mechanism.The proposed neural P2 C model is learned by encoding previous input utterance as extra context to enable our IME capable of predicting character sequence with incomplete pinyin input.Our model is evaluated in different benchmark datasets showing great user experience improvement compared to traditional models.Using the adaptive dictionary update algorithm with the target vocabulary sampling mechanism and an online learning training method to realize an open vocabulary learning on neural IME.Our experiments show that the proposed approach indeed helps our IME effectively follows user inputting behavior.We present Moon IME,a pinyin IME that contains a high-quality P2 C module and an extended information retrieval based module.The former is based on an attentionbased NMT model and the latter contains follow-up-prediction and machine translation module for typing assistance.With a powerful customizable design,the association cloud platform can be adapted to any specific domains including complex specialized terms.Usability analysis shows that core engine achieves comparable conversion quality with the state-of-the-art research models and the association function is stable and can be well adopted by a broad range of users.It is more convenient for predicting complete,extra and even corrected character outputs especially when user input is incomplete or incorrect.The released IME is implemented on Windows via text services framework.
Keywords/Search Tags:Pinyin input method, Deep learning, Aided input
PDF Full Text Request
Related items