Font Size: a A A

Research On Topic Detection And Tracking Of Micro-blog Based On Topic Model

Posted on:2016-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:L L XieFull Text:PDF
GTID:2308330470477038Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, network data online is rising rapidly in short text form. Topic detection and tracking is an information processing technology proposed for the increasingly serious Internet information explosion problem and it plays a very important role in the early warning of network public sentiment. The research on traditional topic detection and tracking technology is aimed at news reports, the text format and the length of the article of these news reports are unified, and the data characteristic of which is different from those of popular essays in the network, so that the traditional methods are no longer suitable for the short text data. Based on this, this thesis proposes a short text clustering method based on topic model for topic detection and tracking, aiming to provide a strong support to the monitoring of network public sentiment. Specific research work includes:1. The survey on traditional topic detection and tracking methods to research, understand makes an understanding of the existing relevant technology. Based on the characteristics of micro-blog platform, the micro-blog text data format combined with the user’s behavior habits were analyzed to get the text formatting features. The obtained text data of micro-blog topic has the characteristics of timeliness, sparsity, singularity and redundancy etc. Because of these characteristics, the application of traditional topic detection and tracking technology in micro-blog data processing makes serious high-dimensional and sparse problems.2. A topic model algorithm is designed by the analysis of the text format characteristics of micro-blog topic, whose the main idea is:preprocess the collected text to obtain the keywords; build a word matrix by keywords; then according to the word matrix to generate word association matrix as well as extract topic words. Afterward, cluster the theme words to generate the topic model.3. The topic model is applied to the topic detection and tracking. The matching of topic model and text can obtain the category of text to achieve text clustering and the effect of topic detection. Then the number of the topic can be obtained by topic detection on the data of each time, so topic situation everyday can be obtained by statistical analysis. Experiments show that, this method can effectively solve the sparse, high-dimensional data processing problems, achieve good results in the accuracy of topic detection and clearly show the evolution of topics.
Keywords/Search Tags:Short text, Topic detection and tracking, Topic model, Topic words
PDF Full Text Request
Related items