Font Size: a A A

Research On Algorithm Of Topic Detection And Tracking

Posted on:2011-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:M Z ZhangFull Text:PDF
GTID:2178360305460388Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Topic detection and tracking (TDT) is a new research field of natural language processing, which focus on helping people solve the problem of information explosion. It can automatically detect new topic and track known topic from the information stream of news media, and present them to people after being organized. Topic detection and topic tracking are two tasks of TDT.Topic detection and tracking system includes five chief modules:news pretreatment, features extraction, weight calculation, construction of topic vector model, and similarity calculation.The computational overhead of traditional algorithm for topic detection based on hierarchical clustering is excessive. So based on this algorithm, the paper proposes and implements a new topic detection algorithm. The new algorithm dynamically renews the features of topic space vector in the hierarchical clustering process, and institutes the rules of news combination. The experiments on TDT5 corpora indicated that:the new algorithm improved the accuracy of topic detection and decreased the computational overhead in the process of news data processing.The paper presents the module which can automatically renewing the features of topic vector, by using the idea of adaptively filtering information. The module can avoid the problem of sparse training data in the traditional topic tracking. According to the dynamic and time-varying character of news, we propose an adaptive topic tracking algorithm with dynamic threshold based on the temporal information. Experiments show that:comparing with the previous algorithm, our methods have better performance on topic tracking system.
Keywords/Search Tags:topic detection, topic tracking, temporal information, topic renewal, hierarchical clustering
PDF Full Text Request
Related items