The Improvement And Implementation Of Mixed Languagemodel On Japanese Input Method

Posted on:2013-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:L Chen

Full Text:PDF

GTID:2268330392469540

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As the development of computer proceeds, computer has touched upon everyaspect of our daily life and work. Meanwhile, information input is an important link inman-machine interaction when a computer is being used. Therefore, a well-developedintelligent input method is in demand, which grows stronger as time goes by; while thesignificance of Input Method Editor(IME) gradually stands out.With the technological development of Natural Language Processing, traditionalword model-based IME has evolved into a sort of intelligent input method that based onlanguage model. Using entry granularity as its basis, the language model is able toraise conversion accuracy when a whole sentence is being converted, and thus thepinyin-to-character conversion accuracy is greatly improved. However, completelanguage model is too big to be employed by IME, so it’s necessary to have the modelcompressed to fit the application. The pruning method that retains only core entries isgenerally adopted by common IME to get language model compressed. In this thesis,however, an entry-clustering method is adopted, thanks to the greater strictness ofJapanese language rules. The relationship between entries is replaced by the relationshipbetween word classes, so that corpus can be better utilized and the sparsity withinlanguage model is reduced to a large extent.Meanwhile, in order to cut down the information loss during language modelcompression, the writer improves the clustering method that based on word classes.That is, to cluster according to entry distances and lower down code duplication withinthe same word class: a k-mean clustering algorithm is put forward. Besides, entryfrequency within a word class is taken into account, so that information loss caused bymerging entries with different frequencies can be avoided. In addition, as for theunavoidable loss during language model compression, Bigram model is utilized to makeit up. At the same time, the accuracy of pronunciation model is improved while itscoverage is raised accordingly.Finally, a scalable model based on hybrid language is established, including2-posmodel,2-gram model and pronunciation model. Integrating the features of the abovethree models, the new one is able to improves the conversion accuracy of IME.Comparative testing is made between different models, and the influence of hybridlanguage model on the conversion accuracy of IME is analyzed.

Keywords/Search Tags:

Japanese IME, language model, clustering algorithm, pronunciation model

PDF Full Text Request

Related items

1	Research On Objective Evaluation Of Pronunciation Quality In An Interactive Language Learning System
2	Research On The Pronunciation Method In Both Chinese And English Based On DIVA Model
3	The National Language And Accent Pronunciation Dictionary Adaptive Mandarin Speech Recognition
4	A Research On Key Technology Of Computer Assisted Putonghua Pronunciation Assessment
5	Motion Analysis And Synthesis System Of Pronunciation
6	HMM-based Pronunciation People Switching System
7	Research On The Pronunciation Error Detection Of Tibetan Students' The National Common Language Based On CNN
8	Researching Of The Mogolian Language Model Based On Speech Recognition
9	Research On English-Chinese Name Transliteration
10	Computer Analysis-based Pronunciation Quality Assessment In Language Learning System