An Research On Language Topic Mining Based On LDA

Posted on:2019-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:L Mao

Full Text:PDF

GTID:2417330563993062

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of text knowledge and the breakthrough of Internet technology,on the one hand,people have begun to try to make computer achieve more deep natural language tasks,such as intelligent customer service system,based on keyword search and so on.On the other hand,how to learn and understand the nearly millions of latent semantics of human text automatically.It has become a hot issue in the research.Since Blei has proposed Latent Dirichlet Allocation(LDA),its paper has been quoted thousands of times and is widely used in various fields such as search engine,recommendation system,network and atlas,advertising prediction and so on.The dimension of "theme" is put forward.On the one hand,it understands the latent semantics of human language text,and also realizes the reduction of the document from word space to theme space,and it removes the noise caused by some invalid words.This paper will focus on the LDA model and its application.The main work is as follows:First,the mathematical theory of probability and statistics related to the model,including Bayesian statistics,multinomial distribution,Dirichlet distribution,conjugate prior distribution and expectation calculation,is introduced,and the word vector representation,word bag hypothesis and PLSI model are described in turn.Secondly,it expounds the basic principles and essence of LDA topic model.It uses an implicit variable that obeys Dirichlet distribution to represent the subject distribution of the document,and constructs a sampling process of three layers of Bayesian probability distribution to simulate the generation of documents.In this paper,VEM and Gibbs methods are used to estimate and compare the parameters.Finally,the LDA model is applied to the unsupervised text topic mining project.The research object is the more than 10 thousand selected articles from the web crawler.First,the text preprocesses such as word segmentation,disuse words and so on;this paper uses TF-IDF to calculate the weight of the document,and draws the word cloud image after the data is sparse and dimensionally reduced,and constructs the LDA model to measure the model effect according to the complexity and the logarithmic likelihood index and select the final number of subjects.By comparing the VEM and Gibbs methods,the Gibbs method is proved to be effective and consumes long training time.Finally,calculating the similarity of words and topics,and according to the input vocabulary recommendation document,it is proved that the text recommendation based on topic mining is appropriate and feasible.

Keywords/Search Tags:

Text mining, topic model, LDA, VEM, Gibbs sampling

PDF Full Text Request

Related items

1	Some Research On Bayesian Statistics In Text Mining
2	Research On College News Topics Discovery Based On LDA Topic Model
3	Research On Microblog Topic Sequential Feature Extraction Algorithm Based On LDA-WO Mixed Model
4	Research On The Influencing Factors Of Childbearing Age Population In My Country Based On Text Mining
5	Topic Mining Research Based On Learner’s Background Information
6	Research On User Preferences Of Cultural And Creative Products Based On Text Mining
7	Topic Mining And Emotion Analysis Of Weibo Caused By Graduate Student Falling From A Building
8	Research On Network Public Opinion Of The 14th National Games Based On Text Mining Technology
9	Research On Hybrid Collaborative Filtering Algorithm Based On Improved User Preferences And Item Features Topic
10	Research On The Application Of Text Mining Technology In Students’ Teaching Evaluation