Font Size: a A A

Research On Improvement To Partitioning Clustering Algorithm And Density-based Clustering Algorithm

Posted on:2008-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y J C ZhangFull Text:PDF
GTID:2178360242967329Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Data Mining extracts knowledge that is not understood beforehand, but is useful to people from dataset which is massive, not incomplete, noise fuzzily, and stochastic. The cluster analysis which is used to discover unknown clusters from large-scale dataset is the important research topic in Data Mining. Therefore, it has the vital significance and the broad prospect to the clustering algorithm research. The core of the paper is for improving the fault of K-means and density-based clustering algorithm.The K-means has the extremely important application value in Data Mining. But with the application development and the new question demand, K-means has certain limitation. Firstly, the initial parameter possibly can cause the different cluster results, even can create the non solution. Secondly, it is the typical mountain climbing reconnaissance method, therefore it forms local convergence easily. So a new clustering algorithm, K-means based on the Shared Nearest Neighbor(KSNN), is designed. KSNN finds the core nodes of the data to get the number of clusters and takes it as the parameter for K-means. It conquers the problem that the number of clusters to K-means must be defined by humans, meanwhile it has better global convergence. Then, Clustering Algorithm Based On Node Priority(CABONW) proposes the effective solution to solve the different density dataset in actual usage. Firstly, CABONW uses the nearest neighbor method to construct the node nature link relations in the dataset. Secondly, it establishes the node priority carrying on sorting to the data node effective relations, creating sequence chart. Finally, it implements the depth first searching the sequence chart to create the clusters. Comparing with DBSCAN and OPTICS, It concludes that CABONW can solve problem of the different density dataset and is more efficient than DBSACAN and OPTICS. Finally, the paper designs the cluster analysis system prototype joining KSNN, CABONW and other cluster algorithms. It may carry on the teaching contrast and the actual dataset analysis and may be used widely in the Data Mining.With the analysis of theory and implementation, it concludes that KSNN and CABONW solve the problem of K-means and density-based clustering algorithm and they are tested on the clustering analysis system prototype.
Keywords/Search Tags:Data Mining, Cluster Analysis, KSNN, CABONW
PDF Full Text Request
Related items