Font Size: a A A

The Research Of Clustering Algorithms For Mining Arbitrary Shaped Clusters

Posted on:2017-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2308330503961504Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining is a powerful technology which can discover knowledge from massive data.Cluster analysis, as a basic tool for data mining, has been applied in many areas, such as pattern recognition, image processing, spatial data analysis, text categorization and information retrieval,market analysis and so on. With the wide access of computers and development of the Internet, the amount of data is more and more increasing, and the spatial distribution of data includes many kinds of clusters with irregular shapes, such as geographic information data, medical image data,agricultural science data and so on, which brings great challenges for cluster analysis. Generally,traditional clustering algorithm cannot detect arbitrary shaped clusters well. Recently, the research on mining arbitrary shaped clusters become a focus in the area of cluster analysis. In order to cluster data containing arbitrary shaped clusters more effectively, We analyzed and studied available clustering algorithms and proposed two clustering algorithms CMSPC and CFDPm,which both can detect arbitrary shaped clusters.CMSPC proposed in our study aims to improve the quality of clustering for the data containing arbitrary shaped clusters. CMSPC is based on similarity between a point and points in a cluster. For two objects with cutoff distance, we consider the degree of membership one object toward the clusters containing the other object and merge temporary clusters which these two objects belong to if the degree of membership meet a certain value. Using CMSPC, Clusters are recognized regardless of their shape, outliers also can be detected during the clustering procedure.The other algorithm proposed in our paper is modified from novel CFDP algorithms. For the dataset with multi-peak cluster, it is difficulty to choose exact cluster center points, and CFDP algorithms can lower the quality of clustering because of the inaccurate choosing of cluster center points. By the systematic and comprehensive reviews of the distance among clusters in clustering results, intra-cluster distance and the impact of merging two clusters on the overall Davies-Bouldin Index, our paper aims to merge clusters conditionally and then improve the clustering quality that lowered by choosing cluster center point in CFDP clustering procedure.In order to test the effectiveness of two algorithms proposed in our paper, we conductedclustering experiments on benchmark datasets. The results of experiments show that: First,CMSPC algorithm can detect arbitrary shaped clusters with a high clustering quality and identify outliers. Second, CFDPm algorithm is expected to solve the problem of CFDP algorithm with lowering quality of clustering on dataset with multi-peak cluster by choosing inaccurate cluster center point.
Keywords/Search Tags:Arbitrary shaped clusters, Clustering, Spatial data, Similarity measure
PDF Full Text Request
Related items