Research On Improvement To Partitioning Clustering Algorithm And Density-based Clustering Algorithm

Posted on:2008-09-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y J C Zhang

Full Text:PDF

GTID:2178360242967329

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The Data Mining extracts knowledge that is not understood beforehand, but is useful to people from dataset which is massive, not incomplete, noise fuzzily, and stochastic. The cluster analysis which is used to discover unknown clusters from large-scale dataset is the important research topic in Data Mining. Therefore, it has the vital significance and the broad prospect to the clustering algorithm research. The core of the paper is for improving the fault of K-means and density-based clustering algorithm.The K-means has the extremely important application value in Data Mining. But with the application development and the new question demand, K-means has certain limitation. Firstly, the initial parameter possibly can cause the different cluster results, even can create the non solution. Secondly, it is the typical mountain climbing reconnaissance method, therefore it forms local convergence easily. So a new clustering algorithm, K-means based on the Shared Nearest Neighbor(KSNN), is designed. KSNN finds the core nodes of the data to get the number of clusters and takes it as the parameter for K-means. It conquers the problem that the number of clusters to K-means must be defined by humans, meanwhile it has better global convergence. Then, Clustering Algorithm Based On Node Priority(CABONW) proposes the effective solution to solve the different density dataset in actual usage. Firstly, CABONW uses the nearest neighbor method to construct the node nature link relations in the dataset. Secondly, it establishes the node priority carrying on sorting to the data node effective relations, creating sequence chart. Finally, it implements the depth first searching the sequence chart to create the clusters. Comparing with DBSCAN and OPTICS, It concludes that CABONW can solve problem of the different density dataset and is more efficient than DBSACAN and OPTICS. Finally, the paper designs the cluster analysis system prototype joining KSNN, CABONW and other cluster algorithms. It may carry on the teaching contrast and the actual dataset analysis and may be used widely in the Data Mining.With the analysis of theory and implementation, it concludes that KSNN and CABONW solve the problem of K-means and density-based clustering algorithm and they are tested on the clustering analysis system prototype.

Keywords/Search Tags:

Data Mining, Cluster Analysis, KSNN, CABONW

PDF Full Text Request

Related items

1	Based On The Application Of Cluster Analysis Of Water Pollution Monitoring System
2	Cluster Analysis In Data Mining And Its Control In Applied Research
3	Web Cluster System Qos Control Mechanism Based On Data Mining
4	The Application Of Cluster Analysis Algorithm In HMIS
5	The Design And Realization Of Cluster Mining System Based On Data Warehouse And OLAP Technology
6	Study On Cluster Analysis And Rule Mining Based On Granular Theory
7	Data Mining Technology And Its Application In The Supermarket In Crm
8	Cluster Sowntown And Appliction Study Based On Least Cluster Cell
9	Design And Application Of Data Platform Based On Cluster Analysis
10	Patent Information Analysis And Applied Research, Based On Data Mining Technology