Font Size: a A A

Research On Chinese Blog Hot Topic Detection And Tracking

Posted on:2008-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:W L DingFull Text:PDF
GTID:2178360245997695Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Blog is becoming a novel individual publication model over internet while makes more and more information open and useful to a great extent. For quickly acquiring meaningful knowledge, there is an urgent need to mine the blog field information. Now, the blog mining techniques are extracting key information, acquiring the statistics information and analyzing it. Topic detection and topic tracking are techniques which can organize the information, auto detect topics in streams of news and tracking the given topics.This paper use topic detection, topic tracking and hot topic ranking techniques to mine the blog field topic knowledge. Using these techniques we can effectively organize and classify the blog information by topic, tracking the interested or needed information and avoiding involving other information.This paper firstly introduces text cluster and classification techniques that are the base techniques of topic detection and topic tracking. According to the features of blog, this paper designs the topic detection and topic tracking techniques. In computing similarity, this paper proposes a word shape and word frequency combining method basing on cosine similarity, and a new method to combine title similarity and text similarity. In extraction topic name, this paper proposes using tf*df method to compute word` s weights. This paper experiments the topic detection and topic tracking techniques in blog field. The result indicates that the two improvements in similarity are valid and using topic detection and tracking techniques are feasibility. In hot topic ranking technique, this paper chooses the number of articles, number of comments and number of commentators in one topic as ranking features, then provide four ranking methods. To satisfy the need of dealing large data, this paper proposes distributed processing techniques of topic detection, hot topic ranking and topic tracking. Experiments show that the distributed processing can efficiently reduce the system` s time.
Keywords/Search Tags:blog, topic detection, topic tracking, hot topic ranking, distributed processing
PDF Full Text Request
Related items