Smoothing a probability distribution so that it generalizes well is a hard machine learning problem. It is particularly challenging when building a statistical language model with insufficient training data. We have developed a new smoothing algorithm (called AlphaRank) to overcome the data sparseness problem by viewing language as a large graph where each word is a vertex and the probability of using another word is determined by the edge weight connecting two words (vertices). Thus, instead of using frequency based rules as is done in prior work, we propose a graph based method to smooth a statistical language model. Our method combines features of context-dependent probability estimators such as n-grams and features from context-independent probability estimators such as the steady-state distribution of a discrete time-step Markov chain. We have tested on a large collection of Arabic newswire articles and compared with previous approaches using the perplexity measure and found our method to be superior. |