Font Size: a A A

Research On Text Clustering And Its Application In Topic Detection Analysis

Posted on:2016-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LvFull Text:PDF
GTID:2308330473465471Subject:Data mining
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the Internet has been used in various industries, changing people’s work and life, and also brought unprecedented challenges. Due to the rapid development of Internet technology, how to collect and organize relevant information has become more and more difficult. How to obtain the required information from the flood of information, has become an urgent problem. Topic Detection and Tracking is a technology which is proposed to solve the problem, that is how to extract important information form news and other data streams. Topic detection is one of the important research tasks in TDT, and its main research topic is to cluster similar events together, therefore, it is significant for us to deeply study TDT.The main work of this thesis is as follows:(1) K-means clustering algorithm is one of the most commonly used clustering algorithms, K-means algorithm which is simplicity of thought and faster speed of clustering has been widely applied. But the K-means clustering algorithm that optionally specifies the initial cluster centers is often falling into local optimal solution, so that the clustering effect is poor.To slove this problem this paper proposes an adaptive clustering algorithm that selects initial centers based on the maximum distances and the minimum distances between data instances and the sum of squared errors(SSE),and identifies the number of cluters automatically.The experimental results show that the proposed algorithm can generate more accurate clustering results without increasing the number of iteration.(2) With the rapid development of the Internet, all kinds of news reports have generated a lot of information, so how to extract valuable information from these news reports has become more and more important. Considering the topic of retrospective detection technology and the data is increasing in testing process at any time, this paper proposes a topic feature selection methods, which combined with speech feature, and corrected the topic feature’s weights which has a high ability to distinguish the right topic. Experimental results also show that this method can enhance the effect of the topic detection, and feature extraction is more in line with the topic feature, so this is a feasible and effective method.
Keywords/Search Tags:K-means clustering algorithm, the minimum and maximum distance, the initial center, the topic feature selection, the ability to identify topics
PDF Full Text Request
Related items