Font Size: a A A

Design And Implementation Of The Micro-blog Topic Detection System Based On Incremental Clustering

Posted on:2013-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaFull Text:PDF
GTID:2268330392462823Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Topic Detection is a sub-process of Topic Detection and Tracking(TDT) whichtries to figure out the topics by exploring and organizing the text material and identifythe unknown topics online. The traditional Topic Detection technique mainly solvesthe problem towards the corpus of news sites, but hardly any researches are formicro-blog platform topic detection.we analyze deeply the character of the text structure in micro-blog, then weclearly get the main needs of topic detection for micro-blog, especially theinformation gathering methods, text clustering algorithms and topic heat calculationalgorithm. Based on the research, our main contributions are as follows:(1) We propose an information gathering&extraction method for micro-blog. Inthe process of information gathering, it analyses the characteristics of the author andtext features to determine whether to include this micro-blog. The method improvesthe micro-blog quality and the collection efficiency.(2) We propose an incremental clustering algorithm for Topic Detection.According to the flexible characteristic of the language used in micro-blog, the textsare expressed by vector space mode, and calculated by incremental DBSCANclustering algorithm. This algorithm can deal with incremental data, and it alsooptimizes the clustering strategy, it is used to improve the topic detection quality andefficiency.(3) We propose a heat calculation algorithm based on the user attention and topicfocus which can calculate the heat of a topic accurately, it can help us to present the results of Topic Detection in a more scientific and reasonable way.Based on the above studying, we design and implement a topic detection system.This system provides a visual and macro way to explore and track the topics discussedin the micro-blog, and improves the information extraction efficiency remarkably.
Keywords/Search Tags:Micro-blog Topic, Information collection, Topic Detection, Topic Heat
PDF Full Text Request
Related items