Font Size: a A A

Design And Implementation Of Text-oriented Data Topic Detection And Tracking System

Posted on:2016-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2348330479454723Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Search engines based on keywords matching return a large amount of records irrelevant to the given topic. However, it takes much time and effort to find out the records about specific topic from these records. Therefore, information filtering on the basis of topic becomes a very significance research field. The traditional topic detection and tracking is mainly implemented by clustering documents hierarchically or by clustering the information about the keywords' distribution on documents. But these approaches have the drawbacks of failing to deal with the online document stream, large memory space requirement and high computational complexity.If two keywords simultaneously appear in many documents, the probability that they belong to a same topic is high. This paper studies the technology of topic detection and tracking based on the graph describing the co-occurrences of keywords. The graph of co-occurrences of keywords is built after Chinese word segmentation and is modified in order to increase the influence of the informative keywords in the co-occurrence graph. To filter out the noise in co-occurrence graph and reduce the time and space computation of subsequent analysis based on feature vectors of keywords, the keywords are mapped into low-dimensional feature space according to the co-occurrence graph and the co-occurrence information between keywords is maintained as much as possible. The correlation between keywords and topics are analyzed in low-dimensional space by soft clustering which conforms the fact that a keyword is associated to multiple topics. The results of soft clustering analysis on keywords establish the foundation for feature extraction for topics. Finally, the task of topic detection and tracking is achieved by calculating the degree of correlation between the feature vectors of topics and documents. The feedback information is utilized to dynamically update the features of topics so as to make the system of topic detection and tracking have the ability of knowledge-based self-adaptation.Experimental results show that the topic detection and tracking algorithm based on the co-occurrence of keywords has much improvement on knowledge-based self-adaptation, accuracy, efficiency. The next phase, we will use this system and other topics detection systems do comparison test in the same data set to analyze the performance, we'll study if the keyword context relationship can improve system performance on the basis of the co-occurrence of keywords.
Keywords/Search Tags:topic detection and tracking, co-occurrence graph, knowledge-based self-adaptation
PDF Full Text Request
Related items