Font Size: a A A

Study Of Application Of A Language Model Combining Statistics And Rules In Chinese Input Method

Posted on:2009-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:J HuangFull Text:PDF
GTID:2178360272978161Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of computer and internet, the needs of language information processing are growing, automatic means are urgently required for people to deal with a flood of language information. Under this circumstance, the keyboard Chinese input technology, which is an important component of the interface between computer and users, becomes a research focus of Chinese information processing. In the research of keyboard based Chinese input, statistical language model is widely used in the input method technology due to its characteristics such as robustness, simplicity and easy to implementation, and so on. But statistical language model has the disadvantages such as sparse data, field dependence, large scale and slow decoding speed. In addition, purely statistical based language models take less consideration on language rules such as the meaning of the words, the meaning of the sentence and the relationship of the context, as a result, the effectiveness of statistical language models in applications are perhaps unreasonable, and it affects the performance of the input method.To solve the afore mentioned issues, this paper proposes a language model combining statistics and rules, and applies it to the Chinese input method. In the aspect of smoothing and compression of language model, a method combining impairment method based on the Turning estimation and Katz method is utilized in this paper and the method based on the unit importance is utilized to compress the size of the language model. In the aspect of extraction of language rules, semantic field is utilized to extract the special rules of words combination in the words net. In the aspect of field adaptation of language model, this paper presents a method which combines basic language model and user model to realize self adaptation of the language model. Finally, in the aspect of implementation, this paper presents an algorithm to seek the optimal path based on words net, to implement the conversion from the input sequence to the output sequence and to ensure the speed of the conversion process and the rationality of the results.Based on the introduction of language rules, this paper, to a certain extent, overcomes the shortcomings of the purely statistical based language models. The application of the work in this paper to the strokes input method shows that, the work in this paper effectively improves the decoding accuracy of and the performance of the language model.
Keywords/Search Tags:Statistical Language Model, Words Net, Language Rules, User Model, Seeking of Optimal Path Based on Words Net
PDF Full Text Request
Related items