Font Size: a A A

Forum Based Topic Detection And Tracking Algorithms Study

Posted on:2014-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:L H WuFull Text:PDF
GTID:2248330398472249Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The Internet has become the major approach for people to acquire information due to its abundant resources, effectiveness and mass coverage. By taking research on topic detection and tracking theories and building topic based index and categorizing internet information, it’s helpful for internet users to find their interest in internet on one hand, and favorable for governments, colleges and institutions to acquire public opinions information timely to secure internet safety and to safeguard social stability.This paper build forum based topic detection and tracking model since forums usually have complete information and a mass coverage and forums users are active and willing to occupy in on-line chatting and discussion.First of all, a normal scheme for topic detection and tracking based on hierarchical agglomerated clustering(HAC) is designed and implemented and successfully applied to a realistic public opinion surveillance and management system, besides,we generate hot topic formula to retrieve the top emergent topics.Secondly, to boost the performance of topic detection and tracking and to solve the deficiencies in the HAC algorithm, a new type of topic detection and tracking algorithms is developed basing on credible association rule mining. The new method changed the traditional approaches of document text clustering of document similarity comparison by introducing a new way of text clustering through directly clustering the terms in each document. This new approach is found to have a lower time complexity and a better performance while taking the on-line topic detection. We conclude in the experiment that this new method is preferable applied to short text clustering such as tweet messages and forum posts.Finally, the credible association rule mining based topic detection and tracking algorithms is applied to a realistic public opinion surveillance and management system. It is found in the operating results that some "garbage" topics are generated due to the deficiencies in the algorithms, to solve this, several performance optimization methods have been conducted and tested, and a new way of topic detection and tracking system based on the credible association rule mining is proposed. The new method is completed using the maximal quasi-clique mining algorithm as its core feature clustering algorithm, and it is found in real, application that this algorithm contributes a lot on topic detection and tracking performance and prove its effectiveness.
Keywords/Search Tags:topic detection and tracking, feature selection, hierarchical agglomerated clustering, credible association rule mining, maximal clique mining
PDF Full Text Request
Related items