Font Size: a A A

Density-sensitive K-means Clustering Algorithm

Posted on:2015-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:2268330425995890Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining techniques has become increasingly important in today’s big data, such asevery flying passenger ticket information, transaction records for each customer in the bank,online shopping users record, the selling price of all goods in every major supermarket and so on,all of which are harbingers of a large number of data are emerging, how to save these big datahas become key issues in today’s information technology research, data mining technology willundoubtedly be our only way to solve This problem. And big data problems at home and abroadis a new areas, the corresponding researchers if they can pay more attention to this issue then wewill win the final victory in this area.K-means clustering algorithm is a data mining knowledge, is the most frequently usedclustering algorithm, many domestic and foreign scholars has in-depth research andimprovement for its, but K-means algorithm itself still has a few drawbacks can’t fullyovercome, as it is difficult to cluster the data sets of non-convex shape, easy to interference bynoise points, the accuracy of clustering high-dimensional data sets is not high, you need to enterthe number of clusters before clustering. Therefore, in this article, several shortcomings on itsinnovation and improvements have been made to greatly enhanced versatility. The main workincludes:1. Proposed RtK-means clustering algorithm based on triangulation, firstly get the initialDelaunay triangulation mesh then delete the long side of the triangle mesh to get a more accuratenumber of clusters based on the overall clustering. The algorithm effectively address k-meansalgorithm is difficult to the clustering of non-convex shape datasets problem, relatively far awayfrom the data class cluster of cases clustering easier to obtain more accurate clustering results onartificial datasets.2.Gives quick way to deal with the edge of the data points within a given radius of the localneighborhood, the specific method is to delete the long side of the point of selecting aneighborhood radius r, then the point of the neighborhood within a radius Gaussian kernelfunction directly clustering, the data is difficult to solve the epidemic problem of clustering datapoint edge, but also solve the problem of clustering interference noise points.3.Proposed PK-means algorithm based on spectral clustering, the spectral clusteringalgorithm is applied to the k-means clustering high-dimensional nonlinear data, the algorithm ismainly from two aspects of the k-means doing the improvement and innovation:(1) Providing the automatic approach to determine the initial number of clusters, calculatingthe relative density of the first data point in each cluster before, then all the data points indescending sort, select some of the highest density of data points relative to its polymerizationclass, and to determine the number of clusters k, finally use this method to verify the UCI datasets, the clustering effect is remarkable.(2) Providing the way of similarity measure based fuzzy, which uses FCM algorithm for themembership matrix method to determine the similarity between the elements, then that isdetermined by the membership matrix to determine whether two different points belong to thesame cluster identified two points of similarity, the similarity measure spectral clustering methodto solve the sensitive issue of the parameters of the algorithm in high-dimensional data setsclustering effect is remarkable.
Keywords/Search Tags:K-means clustering algorithm, delaunay triangulation, spectral clustering, FCMalgorithm, number of clusters
PDF Full Text Request
Related items