Font Size: a A A

Research And Application Of Topic Detection On Micro-blog

Posted on:2014-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhengFull Text:PDF
GTID:2268330392969047Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, more and more people are usingthe Internet to publish and access information. The Internet has become an integral partof people’s life. Increasing amount of the Internet information makes it more and moredifficult to access and results in information overloaded. In order to obtain and manageinformation better, topic detection technology have been proposed. Its main purposewas to research the method of automatic detection of new topics in the multimedia andcross-language information flow. As micro-blog with originality, timeliness andrandomness, the traditional techniques of topic detection on micro-blog environmentperforms unsatisfiedly.Latent Dirichlet Allocation (LDA) as a non-supervised topic model, in the context ofmicro-blog, a shortcoming of the LDA requiring a predetermined number of topicsmakes it difficult for the model to detect topics under the micro-blog application. Thismakes it difficult for LDA model to fit real topic distribution of micro-blog. On theother hand, the hierarchical clustering algorithm does not need to determine the numbercategory, as well as mutual information as a text feature selection method has a gooddistinction. Therefore, in this paper, we proposed an LDA model combined withhierarchical clustering algorithm for topic detection algorithm. The algorithm to solvethe shortcoming of the LDA model which requires a predetermined number of hottopics dynamically generates topics according to the correlation between hot topicwords.In order to retrieve relevant texts based on hot topics, we use a algorithm based onhot topic words and features of micro-blog to retrieve relevant texts.Based on the algorithm above, we built a real-time topic detection system to detecthot topics and retrieve related texts on micro-blog daily. This system achieved goodresults.
Keywords/Search Tags:Micro-blog, Topic Detection, LDA, Clustering, Mutual Information
PDF Full Text Request
Related items