Font Size: a A A

Research And Implementation Of Smooth Techniques And Compression Techniques For Statistical Language Model

Posted on:2013-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:M X DiFull Text:PDF
GTID:2248330395955652Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the process of information society, the importanceand urgency of using computers to process language is increasingly apparent, naturallanguage processing system has been developed rapidly. Currently according to theempirical study of computer linguistics, the core of the natural language processingsystem is Statistical Language Model. Statistical Language Model is a kind ofmathematical model which uses statistical methods to descript the rules of naturallanguage.The development of Statistical Language Model is currently facing two majorproblems, the data sparse problem and large-scale problem. After establishing the model,it needs to be smoothed and compressed. This paper aims at the widely popularized andused Statistical Language Model, researches the smoothing techniques and compressiontechniques of model, focusing on the compression techniques. Basing on introducingthe currently available statistical language model smoothing techniques andcompression technology, this paper brings forward an improved method foraverage-count method and optimizes the calculation of the relative entropy-basedpruning method. For grouping methods, this paper brings forward a grouping methodbased on the variance. Then the compression method which is combined the pruningmethod based on relative entropy with the grouping method based on variance is figuredas final compression method brought by this paper.At the end part of the paper, statistical language model performance testingexperimental platform is used to test the improved techniques brought in this paper. Theplatform gets perplexity for model to test the merits of smoothing method. The errorrate of Chinese Pinyin input method is used to prove the performance of compressionmethods. The result of experiments shows the improved techniques presented in thispaper is better than the original method.
Keywords/Search Tags:Statistical Language Model, Smooth, Compression, average-count, Entropy-based, Variance
PDF Full Text Request
Related items