Font Size: a A A

Research On Partitioned Clustering Algorithm Based On Granular Computing And Density Peak

Posted on:2017-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:X X LuFull Text:PDF
GTID:2358330512468068Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the most important techniques of data mining, clustering analysis has been widely used in many fields, such as bioinformatics, Web search engine and business intelligence. With the society's speeding growth, there are amount of data produced in our daily life. How to extract useful information to analysis the past and the current activities, to predict the future development tendency are the urgent problems need to be solved. Clustering and other data mining techniques were born at the right moment. Clustering is an unsupervised method. It clustering the similar objects to one cluster. lets the dissimilar objects located at different clusters. At last, we got some clusters which are the partition results of the dataset. Also, lager inter-similarity and smaller intra-similarity is one of the clustering criterion.Partitioning clustering algorithm is a classical clustering algorithm, which have attracted more attention for its efficiency and simplicity. However, which also has some defects. Such as, the value of K which represents the number of clusters, must be assigned before clustering, and the initial seeds was chosen randomly. So, how to improve the defect that K always be the pre-defined parameter? How to select better initial seeds? How to optimize its similarity measurement? These are the main research topics of partitioning clustering algorithm. The better initial seeds and similarity measurement is benefit to find more compact clusters and improve the clustering performance. So, this paper focuses on these three key points to improve the clustering performance of partitioning clustering algorithm. The main innovative research works are as follows:1. Combined granular computing with the max-min distance method, this paper proposed a kind of new K-medoids algorithm which contains two different algorithm to choose the better initial seeds. It is aimed at overcoming the defects of the fast K-medoids clustering algorithm which may choose the initial seeds in a same cluster for different clusters. And also, we use the mean similarity between instances as the threshold to construct the defuzzy similarity matrix. Since that, we improve the arbitrary of our K-medoids clustering algorithm in determining the threshold to construct the defuzzy similarity matrix based on granular computing. Firstly, we granulate the datasets into some granules which can be viewed as candidate sets of initial seeds. Secondly, the optimal initial centers which located in dense area and apart from each other were selected. Lastly, the proposed algorithm is tested on the synthetically generated datasets and on the datasets from UCI machine learning repository. The experimental results in terms of clustering accuracy and Adjusted Rand Index etc. demonstrated that the proposed K-medoids algorithm is superior to the traditional K-medoids algorithm and the fast K-medoids algorithm and our previous proposed K-medoids clustering algorithm based on granular computing.2. This paper extends the DPC(Clustering by fast search and find of Density Peaks, DPC) algorithm which published in Science in 2014 to eliminate the subjectivity of the selection of cutoff distance. The extended DPC algorithm can detect the density peaks which comprise the initial seeds for K-modes algorithms and the number of clusters is the number of density peaks. We also define a new dissimilarity measure to K-modes algorithm, which enhances the inter-similarity and is benefit to find more compact clusters. Combing the extended DPC algorithm with the new dissimilarity measure, we propose a new K-modes algorithm in this paper to overcome the shortcomings of traditional K-modes algorithm, such as its clustering results greatly depend on the random initial seeds and its cluster numbers need to be assigned before clustering. This paper tests the validation of the proposed new dissimilarity measure and the performance of the new K-modes algorithms on the datasets from UCI machine learning repository. The experimental results demonstrate that the proposed new dissimilarity measure is very effective and the new K-modes algorithms performs better than the available Huang's K-modes and Ng's K-modes algorithms in terms of clustering accuracy and Adjusted Rand Index, and can clustering big data with categorical attributes.
Keywords/Search Tags:initial seeds, granular computing, density peaks, max-min distance method, neighborhood
PDF Full Text Request
Related items