Font Size: a A A

Adaptive Clustering Algorithm Alaysis Beased On K-Means

Posted on:2010-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2178360278465889Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, a new challenge has been introduced which is how to get and use huge amount of information effectively. We need explicit queries with the traditional search methods, however, sometimes it is hard for us to get an idea of what we want, therefore, how to extract useful information from internet without such explicit queries has became an meaningful research topic. Text mining is an effective method to extract useful information from such unstructured text data. Clustering algorithm is a key technology method for text mining, it could be used to discover useful data distribution and implicit data pattern and we could find useful structure and cluster without background of knowledge.Under these background and precondition, firstly, in this thesis, we review the current status of clustering algorithm; its relations with related research fields are also introduced. In order to pave the road for the following sections in this thesis, we express and discuss the basic concepts of similarity calculation algorithm, distance measure etc in clustering analysis by mathematics. At the mean time, we analyze five traditional clustering algorithms and make a performance contrast between them. Based on analyzing the merits and demerits of these algorithms, we proposed an adaptive clustering algorithm, this method could help us get categories number automatically by finding optimum solution of the discriminate function we defined. In this approach, we could avoid the subjectivity when choosing the number of clusters by our human experiences, under this condition, the demerits of the traditional clustering methods could be avoided, the effectiveness of this manner also be proved by the following experiment. Then this thesis introduces a new topic detection system based on the adaptive clustering algorithm we proposed ahead. This system could discover implicit knowledge in text information flow and also could provide the (?)epresentative keywords for different topic according to their main idea. The results of the experiment show that this system could discover the potential text topic information effectively; the validity of the adaptive clustering algorithm is supported again.Finally, we sum up the works in this thesis, and make a discussion and proposal for the future work to enhance the performance of the algorithm.
Keywords/Search Tags:Adaptive clustering, Topic detection, Discriminate function, Feature selection, Text mining, Name Entity
PDF Full Text Request
Related items