Font Size: a A A

LDA Based Cyrillic Mongolian Topic Model

Posted on:2017-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y P JiangFull Text:PDF
GTID:2348330485461608Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Topic model is a statistical probability method of building the sentence structure. We can use that to learn the latent abstract topic. Recently that was widely used in the machine learning and Natural Language Process. We can analyze the probability to get the classification of the paper and the paper's topic distribution.We take the experiment to the Cyrillic Mongolian Topic Model. Then We analyze features of the Cyrillic Mongolian. Then we get the start of cutting the suffix, recognizing the named entity and removing the stop words. The basic words combined with the named entity were selected as the unit and construct the vector space model. Compared with the LSA, PLSA and LDA, we get a deep know to the topic model. We compared the different model on the efficiency of constructing the topic model. Finally, we select the LDA as the topic model to construct the Cyrillic Mongolian topic model. Compared with the EM and Gibbs Sampling, we get some access to improve the Gibbs Sampling. The performance of the LDA topic model was got improved.We build the topic model in Cyrillic Mongolian Document. The topic could help us get a quick understanding of the whole content. We can do good and fast in the machine learning and Natural Language Process. We have a fully description to the topic model.
Keywords/Search Tags:Topic Model, LSA/SVD, PLSA, LDA, EM, Gibbs Sampling, Cyrillic Mongolian
PDF Full Text Request
Related items