Font Size: a A A

Research On Microblog Overlapping Topic Detection Based On Topic Model And Mixture Model

Posted on:2014-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhanFull Text:PDF
GTID:2248330398974600Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Micro-blog is a relationship based on user information sharing, dissemination and access to the platform. Micro-blog has become one of the main sources of information on the Internet. It is very different from other network text. Firstly, it has relatively simple content (Its main body usually includes less than140words). In addition, it can be posted in real-time by mobile phone, instant messaging software and so on, which results in large amounts of data in a short period of time. This kind of data is often huge, messy and chaotic. It is extremely difficult to find the interesting information accurately and efficiently.Topic detection technology is a new research field of natural language processing. It focuses on helping the user collection and merging of the information that distributed under the same topic. The users find the information they interested in quickly and accurately. Although the traditional topic detection algorithms based on VSM (Vector Space Model) and clustering algorithm achieved good results and facilitated a wide range of applications, when dealing with a large scale Micro-blog short text, there are some shortcomings. Firstly, they exist high-dimension, sparse, synonymy problems when the documents are presented by feature vectors. In addition, the most clustering algorithms of traditional topics extraction are partitioning method, which did not consider the relationship between the topics, so there are some limitations.Under these circumstances, the topic model is proposed as text representation model according to the characteristics of Micro-blog. There are three main topic models:Latent Semantic Analysis (LSA), Probability Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). LDA is one of the most popular and commonly used topic models today, so this thesis utilizes LDA model to extract the hidden Micro-blog topics information from the dataset. And then, an overlapping topic detection algorithm based on mixture model is proposed in order to solve the insufficiency of traditional topic detection algorithm. Finally, a Microblogging overlapping topic detection system is established. Experimental results on real data sets show the feasibility and validity of the algorithm.
Keywords/Search Tags:Microblog, topic model, overlapping topic detection, mixture model
PDF Full Text Request
Related items