Font Size: a A A

Research On Internet Topic Detection And Tracking Based On Blog

Posted on:2012-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:R Y DingFull Text:PDF
GTID:2178330335950923Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
ABSTRACT:With the fast development and popularization of information technology, there is huge amount of Internet information every day. How to detect the topics among this huge amount information precisely and quickly, and how to track the topics that detected is one of the focuses of research.This paper starts with translating text information into the vector space, it first introduces the common vector space building methods, and then bring up a new improved TFIDF method based on analyzing the shortages of the common methods. The new improved method is more suitable for Blog topic detection's own characters, and aimed to get more precise result.Secondly, on the bases of feature selection, this paper tries to reduce the dimensions of vector space, by means of feature reconstruction. Through the careful study of Latent Semantic Indexing and Singular Vector Decomposition, we analyzed the effectiveness and feasibility of their application to the Blog topic detection.Thirdly, through comparing and analyzing the shortages of the common clustering methods, this paper brings up a new HD-K-means clustering method, its main purpose is to solve the problems that the clustering results are close related with the initial parameters, and that the clustering results are often local optimum. Through picking the topic label among the clustering results, we structured a basic method for topic detection.Next, after comparing the performances and characters of the common used classification methods, we use Naive Bayes Classifier to realize the topic tracking process, and combining the characters of the Blog, we make some improvements to the formulas that used for classify feature selection and conditional probability calculation. Last, we summarized the work of this paper, and through analyzing the experiment results, we made clear the way for future research.
Keywords/Search Tags:vector space, dimension reduction, clustering, topic detection, classifier, topic tracking
PDF Full Text Request
Related items