Research Of The Mixed Language Model Based On Statistical And Latent Semantic Analysis

Posted on:2008-08-31

Degree:Master

Type:Thesis

Country:China

Candidate:T Q Zhen

Full Text:PDF

GTID:2178360245998166

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Language model, which has the responsibility for the transformation from Pinyin to Chinese words, plays a key role in speech recognition. Its performance has a direct impact on the outcome of speech recognition. The current most widely used language model is statistical language model, and data sparseness is one of the main problems it has to face. Meanwhile, statistical language model just takes the local information into account, so it is meaningful to add some global information into it.There are many forms of smoothing techniques applied in statistical language models, including Katz smoothing and Church-Gale smoothing which are widely used in speech recognition. In this thesis, we quote Bellegarda's latent semantic analysis language model to join global information to statistical language model. Latent semantic language model predicts the probability of word occurrence from the perspective of content, so it is a good supplement to statistical language model. Through the singular value decomposition of word-document matrix, all documents and words are represented by the same dimension vectors, then the similarity of their corresponding vectors can be used to measure the prediction ability that documents affects words occurrence. It forms a new mixed language model by combining statistical language model and latent semantic language model, and it considers both local information and global information. Perplexity, as an important way to measure the performance of different language models, can be used to compare the performance of the mixed language model and the statistical language model.In the experiment, a bigram language model with Katz smoothing and a latent semantic language model with direct modeling are constructed. Combining the two different types of language models forms a mixed language model. The experimental results show that, compared to the bigram language model, the perplexity of the mixed language model declines, and the performance improves.

Keywords/Search Tags:

Statistical Language Model, Latent Semantic Analysis Language Model, Perplexity

PDF Full Text Request

Related items

1	Application And Research Of Statistical Language Model
2	The Study On Basic Elements To Build Statistical Language Model Of Uyghur
3	The Smoothing Technique Based On Mutual Information For Statistical Language Model
4	Latent Semantic Analysis In Language Identification
5	Application Research On Statistical Language Model Of Large Vocabulary Continuous Speech Recognition System
6	The Design And Realization Of Programming Model Language
7	The Design And Implement Of A Mobiles' Chinese Input System Based On Statistical Language Model
8	Using Statistical Language Modeling For Ad Hoc Information Retrieval
9	Research On Cross-Language Information Retrieval For Biomedicine
10	The Application Of Cross-Language Information Retrieval Based On Latent Semantic Analysis