Font Size: a A A

Research On Key Technology Of Hot Topic Perception On Micro-Blogs

Posted on:2016-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:H H XiFull Text:PDF
GTID:2298330467972478Subject:Information networks and security
Abstract/Summary:PDF Full Text Request
Micro-blog platform, a new product of the WEB2.0era, has been developing rapidly in recent years. What’s more, it has become an important carrier for social public opinion propagating. It is crucial for us to mine、extract、analyze and supervise the public opinion on micro-blogs. Hot topic perception can not only discover hot words or popular events on micro-blogs, but also acquaint the social dynamics and what the common people are thinking about. It’a kind of technique with strong social implications as well as practical significance. This paper includes:1. In view of the existence of large number of advertising micro-blog and the remaining noise micro-blog, we join an advertisement filtering module and a noise filtering module before the text clustering module. The method can improve the traditional topic perception mechanism of texts and enhance the efficiency of clustering. On basis of the common behavior of advertising users, we use C4.5decision tree classification algorithm to filter the texts on micro-blogs. Moreover, we optimize the threshold’s selection method for the segmentation of continuous attributes. In the noise filtering module, a noise filtering scoring algorithm based on the frequency of characteristic value is presented. The algorithm regards the micro-blog texts without high frequency feature term as noise and filters them. For the purpose of preventing the characteristic value collection becoming too large and considering the micro-blog topic is in real time, we add a slide window on the process of word frequency statistics. In addition, because of the different contribution each part of speech make in topic representation, we weight the feature term by part of speech in the scoring algorithm. The algorithm can filter the remaining noise micro-blog effectively.2. Aiming at the timeliness of the micro-blog topic, we add a time parameter in the formula of Cosine Distance to enhance the accuracy of text’s similarity calculation in the topic perception module. In the text clustering module, on account of the value of K and the center of topic is difficult to determine for K-means clustering algorithm, we add a preliminary segmentation process before the algorithm based on the collection of characteristic value. The progress can optimize the establishment of K and topic center and enhance the performance of K-means algorithm. Traditional heat evaluation algorithm of micro-blog topics simply takes the user’s participation into consideration. In order to make the assessment more objective and comprehensive, we improve the algorithm by combining the propagation influence of micro-blogs with user’s participation.Finally, a hot topic perception system for micro-blog is implemented by JAVA language. We have designed a series of experiments on the system to verify the capability of our mechanism. Experiments show the improved mechanism has favorable performance in micro-blog hot topic perception.
Keywords/Search Tags:Micro-blog, text, Classification, Clustering, hot topics
PDF Full Text Request
Related items