Font Size: a A A

The Research And Implementation Of Discovery Of New Words For WI Input Method

Posted on:2012-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:C B ZhouFull Text:PDF
GTID:2218330362451672Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Pinyin input method converts alphabetic string to Chinese character string. The accuracy of conversion depends largely on whether the dictionary covers common words, specially some new words. It will take large effort to add new words into dictionary manually. The new word discovery technology finds new words from a large-scale of text automatically, which has some features such as automatic and easy to find new words. This article will explore new word discovery technology, and then add the new words into the dictionary used in input method to increase the accuracy.First, this paper discusses methods of two kind of new words: emotional words and commodity words. In emotional words mining, this paper discusses the Chinese emotional words mining using iteration method which is based on the principle of the maximum flow minimum cut. Experimental results show that this method has a strong capacity on subjective word mining, its performance is better than that of traditional subjective term mining based on statistical model. In commodity words mining, the data source comes from user's search log on shopping site. First, this paper finishes word segmentation on users'query depending on the search log data's characteristics. And then calculate the conditional probability of the candidate strings using N-gram increasing algorithm and the string frequency statistics. Finally, select the commodity words.Finally, this paper describes the related development processes of input method for iOS platform of Apple Company. And shows the important role of the new word discovery technology used in WI input method. WI input method is developed by Web Intelligence Research Center of computer science department of Harbin Institute of Technology. And it is a statement-level Chinese input method. This input method was released on November 11, 2010. Now the number of its users has been more than 120000. Its accuracy and fluency have received high praise from large number of users.
Keywords/Search Tags:new word mining, input method, maximum flow minimum cut, N-gram increasing algorithm
PDF Full Text Request
Related items