Font Size: a A A

Research On Clustering Algorithm Based On Shared Neighbor Affinity

Posted on:2019-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:H XinFull Text:PDF
GTID:2428330545953412Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is the process of mining hidden,unknown and potentially valuable information from mass data.The common methods of data mining include classification,regression analysis,clustering,extraction of association rules,deviation analysis,Web page mining and so on,they analyze the data from different angles and apply the information and patterns excavated to the productive practice.Clustering is the process of dividing a given dataset into non intersected and non empty subsets.By clustering people can analyze the structure within the data,and find patterns that have the same characteristics.Clustering is widely applied in many fields such as data mining,pattern recognition,machine learning,information processing and so on,it is one of the important research topics in data mining.On the basis of studying the existing clustering algorithms,this thesis studies the similarity measure required by the density clustering method,analysis the problems of existing similarity measure in clustering processing,proposes a new measure of similarity and accordingly proposes a clustering algorithm based on shared nearest neighbor affinity.On the other hand,this thesis studies the existing clustering boundary algorithm and analyzes the distribution characteristics of the cluster boundary,On the basis of the boundary detection algorithm based on matrix model,use the boundary detection algorithm as the preprocessing of clustering processing,and use the boundary information to guide the clustering process.The main innovative work of this thesis is as follows:(1)Give the concept definition of affinity affinity by combining k nearest neighbor and shared nearest neighbor,then propose a local density measurement model based on this concept.(2)According to the idea of clustering the core points firstly,and then cluster the non core points,a clustering algorithm based on shared nearest neighbor affinity is proposed.Experimental results show that the algorithm can detect clusters of arbitrary shape,size and density.Compared with similar algorithms,the algorithm has higher clustering accuracy when dealing with multi density and high-dimensional data sets.(3)Propose an idea of using the MMC(Clustering boundary detection based on matrix model)algorithm to extract the boundary points firstly and then form the clustering via core points and boundary points from inside to outside.(4)Propose a clustering technique based on the matrix model for boundary detection,experiments are conducted on data sets with different size and distribution.Experimental results show that the algorithm can effectively identify clustering boundaries and achieve good performance on clustering.
Keywords/Search Tags:Clustering, Density, Shared Neighbor, Clustering Boundary, Data Mining
PDF Full Text Request
Related items