Font Size: a A A

Research On BBS Topic Detection And Tracking

Posted on:2012-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XiFull Text:PDF
GTID:2218330371962641Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, BBS has become an important data resource for public opinion. BBS topic detection and tracking(TDT) technologies can be used to organize the mass, disorderly and disperse data effectively, which can promptly detect hot topics in BBS and follow the development of key topics. Therefore, this technology will help the relevant departments to keep abreast of public opinion and take appropriate measures. This dissertation considers the theory and methods of classical TDT for news text, the characteristics of BBS data, and studies key technologies of BBS topic detection and tracking, including: formalized representation of BBS data, BBS hot topic detection and BBS key topic tracking. The major contributions are listed as follows:(1) The limitations of current text representation technologies used to represent BBS data are analyzed. According to the characteristics of BBS, a multi-factor weight strategy based text representation method for BBS data is proposed. The method not only considers the term frequency and inverse document frequency, but also uses entity information weight and location information weight. Experimental results show that the method can represent BBS Data effectively.(2) A BBS hot topic detection method based on multi-strategy is presented. Firstly, a candidate set of hot topic features is extracted and then filtered. Secondly, post threads are found according to the features in the candidate set above to obtain pseudo-topic. Thirdly, the hot topic is obtained through hierarchical clustering of the threads in the pseudo-topic, then the intersection is re-determined. Finally, the topic heat score is calculated. Experimental results show that, this method can detect the BBS hot topics effectively, not only holding the accuracy of detection, but also reducing time and space complexity of the traditional method.(3) A BBS topic tracking method based on semantic similarity is put forwards. BBS data may be similar in semantic with different forms. The method can solve this problem. Firstly, the semantic similarity between the words is calculated using HowNet. Secondly, the keywords lists of topic and thread is built to get the topic model and thread model according to their corresponding key term weighting method, respectively. Finally, BBS topic tracking is carried out by calculating the semantic similarity between the two keywords lists, which is used as the relevance between the thread and topic. Experimental results show that this method can track the threads related to the same topic effectively.
Keywords/Search Tags:BBS, Hot Topic Detection, Key Topic Tracking, Vector Space Model, Hot Topic Feature, Topic Ranking, Key Words Table, Semantic Similarity
PDF Full Text Request
Related items