Font Size: a A A

Improvement Of Single-Pass Clustering Algorithm And Its Application In Microblog Topic Detection

Posted on:2017-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2358330482991354Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the Internet, micro-blog is more and more popular with the majority of users as a convenient and efficient tool for communication.Micro-blog's instant messaging feature makes the micro-blog information can spread quickly, which not only has a strong social influence, but also contains a high value of information. However, the characteristics of micro-blog's fission, autonomy and so on, make the amount of text information show explosive growth, which leads to the user to browse the text information is too much, too scattered. Therefore, it is of great practical significance and research value to help people quickly find valuable information from the vast amount of micro-blog information, and to understand the topic of interest and its development trend in time. Based on this, this paper mainly focuses on the following three aspects of the study:Firstly the micro-blog related concepts and micro-blog information acquisition two commonly used techniques are introduced, through the analysis based on information collection of web crawler and micro-blogging data information based on the API collection has its advantages and disadvantages, taking into account the main object of study is micro-blog text topic detection mainly. Therefore, the study of the process in the latter is used to obtain the micro-blog information.Based on the analysis of the basic topic detection process, first through the data processing, filter out a lot of junk information in micro-blog; followed the model of the micro-blog text is built by the text representation, feature extraction and feature weight calculation; then based on the calculation of text similarity; then in order to introduce Single-Pass clustering algorithm in this paper, this paper briefly introduces four kinds of commonly used text clustering algorithm, the advantages and disadvantages through the analysis and comparison of the comprehensive consideration of the algorithm to determine the clustering method used in this paper; finally the performance of the recall rate, false positive rate, false negative rate of the three commonly used topic detection evaluation index to test the improved Single-Pass clustering algorithm.This paper adopts the improved Single-Pass clustering algorithm for topic detection and its application to micro-blog topic detection system, the results displayed is more accurate and the time lost is more less.
Keywords/Search Tags:micro-blog, information acquisition, Single-Pass algorithm, topic detection
PDF Full Text Request
Related items