Font Size: a A A

Research On Topic Detection Of Weibo Based On Improved TF-IDF Algorithm

Posted on:2016-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z S JinFull Text:PDF
GTID:2308330476454975Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology, Web 2.0 has been more and more perfect and there are more and more applications based on the Web 2.0. Based on this background, the social networking platform appeared barbaric growth, and it has a revolutionary impact to the people’s life, making friends, shopping and communication. In all the social networking platforms, Twitter and Sina Weibo are the most representative. By means of Weibo, people can express their views or talk about some news which are heared or saw whenever and wherever possible. They can be both author and the person who carried the message. The information on the network becomes rich and colorful, people also will have trouble on reading at the same time when facing the huge amount of information overload. So it becomes a new challenge that how to quickly, accurately filtrate the content in the huge number of Weibo which are interested in by people.In order to solve the problem above, this paper proposes an improved detection algorithm by having a in-depth study on the technical achievements of topic detection at home and abroad in recent years and analysing the characteristics of Weibo. Weibo has the characteristics of fast renewal, timeliness strong. Hot topics produced by Weibo are burstiness, but their representative words will increase obviously. Using this feature, improving the representative words’ weight to a certain degree is a good way to give prominence to the feature of short text. So an improved feature extraction algorithm named TF-IDF-KE(Term Frequency- Inverse Document Frequency- Kinetic Energy) is proposed based on TF-IDF(Term Frequency- Inverse Document Frequency). The algorithm makes use of the kinetic energy principle in Physics to describe bursty feature and improves the weight of bursty feature when extracting features. Finally, the implementation of one text clustering algorithm completes the Weibo topic detection task. The method presented in this paper describes bursty of text and feature and solves the problem that the features of bursty hot topics are not obvious when clustering in a certain extent. The experimental results show that the method can effectively improve the effect of topic detection in some degree.
Keywords/Search Tags:Weibo, term frequency–inverse document frequency(TF-IDF), topic detection, topic detection and tracking(TDT), text clustering
PDF Full Text Request
Related items