AlphaRank: A new smoothing algorithm based on combination of link analysis techniques and frequency based methods

Posted on:2010-07-25

Degree:M.S

Type:Thesis

University:State University of New York at Buffalo

Candidate:Mukhtar, Omar

Full Text:PDF

GTID:2448390002988219

Subject:Language

Abstract/Summary:

Smoothing a probability distribution so that it generalizes well is a hard machine learning problem. It is particularly challenging when building a statistical language model with insufficient training data. We have developed a new smoothing algorithm (called AlphaRank) to overcome the data sparseness problem by viewing language as a large graph where each word is a vertex and the probability of using another word is determined by the edge weight connecting two words (vertices). Thus, instead of using frequency based rules as is done in prior work, we propose a graph based method to smooth a statistical language model. Our method combines features of context-dependent probability estimators such as n-grams and features from context-independent probability estimators such as the steady-state distribution of a discrete time-step Markov chain. We have tested on a large collection of Arabic newswire articles and compared with previous approaches using the perplexity measure and found our method to be superior.

Keywords/Search Tags:

Method, Probability

Related items

1	A New Probability-based Method Of The Analysis Of The Integrity Of Power Grid
2	Research On Chinese Text Classifier Based On Probability Method
3	Research On Formation Generating Method Of Swarm-robots Based On Probability
4	A Research On The Simulation Method Of Probability Circuit Base On FPGA
5	The Method Study Of Belief Fimction Probability Approximation In D-S Evidence Theory
6	Research On The Low Complex Acquisition Techniques Of Pseudo Code Under A Large Dynamic And High Spreading Gain Environment
7	Research On Hardware Trojan Detection Method Based On Transition Probability Analysis
8	Low Rank Approximation Of High Dimensional Matrix Based On Sampling And Its Application
9	Research On Probability Output Method For Integrated Recommendation Attack Detection Modeel
10	Research On Fault Diagnosis Method Based On Probability Frame