Font Size: a A A

Research And Analysis On Microblog Hot Topic Detection

Posted on:2013-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:W W PangFull Text:PDF
GTID:2308330482472821Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the further development of Internet, MicroBlog has already become an important way of network communication and gradually blend in people’s daily life. The MicroBlog makes it possible for us to get and share information at anytime and anywhere. However, using the MicroBlog to publish the information is easy and the information spreading is very fast. All these contribute to serious problem of information overload. When faced with a mass of rapidly updated mass MicroBlog information flow, People’s attention is widely dispersed and they lose the ability to effectively screen out the hot topics which they needed. Therefore, how to accurately and effectively detect the hot topics from the MicroBlog information flow is becoming a very important direction problem for MicroBlog research. The research can not only solve the problem of the MicroBlog information overload, but also can make a contribution to monitoring the network hot events and public opinion.In order to efficiently find out the hot topics from masses of MicroBlog information flow, this paper regarded the MicroBlog short texts as the main research object and took the MicroBlog hot topic detection problem as the Text Clustering problem. Through the analysis of the characteristics of the MicroBlog short text and the existing text clustering thoughts, this paper put forward the FTSC and at the same time, designed and implemented the MicroBlog hot topic detection prototype system. The research got a very good result from real MicroBlog data set. The main work and achievements are as follows:1. By comparing the disadvantages and advantages of the existing MircoBlog capture methods and combining the requirements of the prototype system to data, we designed the MicroBlog API based information collector. Through adding access tokens pool, it weakened the effect caused by the MicroBlog API usage restrictions and strengthened the information acquisition capacity of the collector.2. Through the analysis of the characteristics of the MicroBlog short texts and Combined with the targets of the MicroBlog topic detection, we put forward the MicroBlog feature selection method that supporting the time response.3. Take the Frequent trend words got under the frequent pattern as the core characteristics to describe the MicroBlog topic. Introduce these words into the "Hownet" semantic library to expand the semantic information of the short texts, adopt the "cluster-center" clustering thoughts and put forward the semantic clustering method based on the Frequent trend word set(FTSC). we analysis the minimum cluster support θ and inter-cluster similarity threshold λ setup issues; design and implement the prototype system of MicroBlog hot topic detection, and verify the effect of the prototype system. At the same time, we visualize the clustering results and excavate the relation among the topic clusters.
Keywords/Search Tags:micro blog, topic detection, feature selection, frequent pattern, text clustering
PDF Full Text Request
Related items