Font Size: a A A

Research On Chameleon Clustering Algorithm Based On Nearest Neighbor

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:D D LvFull Text:PDF
GTID:2428330626962884Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In recent years,with the advancement of science and technology,the ability of data collection and data storage has been further developed,so that people can obtain massive data.How to find useful information from the massive data has become an important issue.Data The emergence of mining technology provides a way for people to solve this problem.Data mining technology combines various data analysis methods with algorithms that process massive amounts of data to provide new solutions for exploring new data types and using new methods to process old data types.There are four main types of data mining,among which the cluster analysis method is the most widely used branch in data mining technology,and has applications in applied statistics,information retrieval,biological research,and business.Chameleon clustering algorithm is a condensed hierarchical clustering algorithm.It uses an effective graph partitioning algorithm to combine the initial partition of data with a hierarchical clustering scheme,and uses a novel similarity metric function to divide the graph Classes are grouped together to obtain the final clustering result.The Chameleon algorithm can be applied to data sets of different shapes,densities,and sizes,but it still has certain limitations:The Chameleon clustering algorithm needs to determine parameters in multiple key stages,and the clustering results are more sensitive to the parameters;the distance metrics used in the algorithm are similar It is not suitable for high-dimensional data,resulting in poor final clustering results.This paper studies Chameleon clustering algorithm,the specific research content and research results are as follows:1.A Chameleon clustering algorithm based on natural neighbors(NN-Chameleon algorithm)is proposed.Aiming at the problems of the traditional Chameleon clustering algorithm in sparseness and the creation of k-nearest neighbor graphs,the manual input of the parameter k is required;when merging the sub-cluster,it is necessary to manually guide the termination of the algorithm and the impact of unprocessed noise points.Chameleon algorithm.First,use the concept of natural neighbors to create a natural neighborhood weighted graph in the sparse phase;then,use the improved density peak algorithm of natural neighbors to divide the graph in the graph division stage,and divide the natural neighborhood graph into initial subclusters;The modularity in the division of community complex network structure determines the final number of clusters to guide the sub-cluster merge.The improved algorithm of this paper is tested on UCI data set and synthetic data set,and compared with the five clustering algorithms.The experimental results show that the results of this algorithm on the three commonly used cluster evaluation indicators are better and the clustering effect better.2.Proposed Chameleon algorithm based on shared neighbors(SNN-Chameleon algorithm).When the traditional Chameleon algorithm clusters high-dimensional data sets,the similarity of distance measures is no longer applicable,resulting in poor final clustering results.In this paper,the concept of shared neighbors is used to measure the similarity between data points to create a weighted graph of shared neighbors;then use recursive dichotomy and Flood Fill to divide the graph to get sub-clusters of basically the same size;finally,according to the first truncation The method uses the characteristics of the dendrogram obtained by the clustering process to guide the metric function to perform sub-cluster merging to obtain the final clustering result.Through experiments,the algorithm of this paper is compared with Chameleon clustering algorithm and M-Chameleon clustering algorithm.The experimental results show that the algorithm of this paper has certain advantages for the clustering of high-dimensional data sets.
Keywords/Search Tags:Nearest neighbors, Chameleon clustering algorithm, Natural neighbors, Shared neighbors, High-dimensional dataset
PDF Full Text Request
Related items