Font Size: a A A

Research Of Micro Blogs Hot Topic Detection Technology

Posted on:2014-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:F TianFull Text:PDF
GTID:2268330422460761Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of micro-blog, has become the people’s access to and dissemination of information in the main form, social problems and crisis triggered by micro-blog also were frequent situation. How to find a hot topic in the complicated micro-blog information, has become focus of the topic detection and tracking research area. At the same time, to master the dynamic development of events and network public opinion supervision, provide favorable data assurance, the micro-blog hot topic discovery,have very important theory value and practical significance.First,this paper collected from the micro-blog information, to solve the problem of micro-blog platform API limited or instability, design the web crawler tool, and realizes the information collection of manual and automatic two ways.Secondly, this paper uses Lucene to preprocess the micro-blog information. Aiming at the shortage of the original vector space, using latent semantic analysis method of vector space by singular value decomposition, to reduce the bag of words dimension and eliminate semantic noise effect.Again, in the analysis of the advantages and disadvantages of various text classification algorithm based on the proposed by Bias, the simple classification algorithm of topic detection and tracking. Simple Bias classification algorithm is a very simple and efficient algorithm for text classification, calculate the classification item appears under the condition of other known, probability Classification appears, choose the maximum probability, will be classified into the categories of items.Next, according to the characteristics of micro-blog users, through the inverted index Lucene keyword, quickly find out the key words in the micro-blog, then find the attention of the micro-blog users attention, through words that find the attention of micro-blog, so calculation of the final topic attention degree. Compared with the traditional calculation method micro-blog topic hot, this algorithm more consideration to their own micro-blog user characteristics, so the more effective and accurate.Finally, based on the above work, this paper realizes the micro-blog public opinion analysis system, and the data were analyzed by test, through the analysis of the experiment results, find out in the course of the study need to be improved, it has become clear that the focus of future research work.
Keywords/Search Tags:Micro blog, Topic discovery, Crawlers, Search engines
PDF Full Text Request
Related items