Font Size: a A A

The Research About Partition-based And Density-based Clustering Algorithm

Posted on:2012-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y F YuFull Text:PDF
GTID:2218330338470817Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of IT industry,data mining has been applied widely. Data mining mainly includes correlation analysis, categories, clustering and so on. Clustering is an important research direction for data mining. The tradition clustering algorithm includes partition-based method, level method, density-based method, grid-based method and model-based method. Clustering can effectively deal with a large number of data which have not sign of class, Clustering is widely used in finance, biology, astronomy and some other fields.Above all,this thesis introduced the concepts of data mining, then this thesis introduced the traditional clustering algorithm. Partition clustering and density clustering are used commonly. However, the traditional clustering algorithm has many problems, for example, the algorithm may be affected by input sequence of data and isolated point easily. It reduces the quality of clustering. Therefore, this article mainly analyzed the K-Means algorithm and DBSCAN algorithm, then proposed improving ideas, it improved the quality of the algorithm effectively.The algorithm of K-means is a kind of classical clustering algorithm, it has many advantages but also disadvantages. For example we must choose the initial clustering number. The choice of initial clustering centre has randomness. The algorithm receives locally optimal solution easily, the effect of isolated point is serious. This thesis mainly improved the choice of initial clustering centre and the problem of isolated point. First of all, the algorithm calculated the distance between all data and eliminated the effect of isolated point. Then this thesis proposed one new method for choosing the initial clustering centre. This thesis compared the algorithm having improved and the original algorithm using the experiment. The experiments indicated that the effect of isolated point for the improved algorithm reduced obviously, the results of clustering approach the actual distribution of the data.The algorithm of DBSCAN is an algorithm based on density, it may find kinds which are arbitrarily form in the environment having noise. But the algorithm is sensitive to the input parameters, because the algorithm uses the global Eps, therefore in the case of uneven data and the larger distance between classes, the clustering quality will be greatly affected. This thesis mainly improved the choice of Eps, and solved the problem of uneven data. This thesis proposed a new method of data partition, by clustering the value of k-dist vertical axis, the algorithm completed partition. Each data partition was uniform. Experimental results showed that improved algorithm eased the problem of deterioration clustering quality significantly. The improved algorithm had a more accurate result of clustering.
Keywords/Search Tags:Data Mining, Cluster Analysis, K-Means, isolated point, DBSCAN, data partition
PDF Full Text Request
Related items