Font Size: a A A

Research On Clustering Algorithm For Fast Recognition Of Density Backbone

Posted on:2019-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y M TangFull Text:PDF
GTID:2428330542494440Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the most active research direction in the field of data mining,which can discover the distribution and pattern of statistical objects under unsupervised conditions.It is a process to divide the collection into multiple classes which are composed of similar objects and disjoint among each other.The clustering algorithms have been applied in many fields successfully,such as bioinformatics,network security,image processing,astronomy and social networking.With the explosive growth of information,various high-dimensional data sets are stored in various industrial fields.Usually,the sparsity of high-dimensional data leads to the low accuracy of traditional clustering algorithms even not-working.It is a challenge to deal with the massive datasets in a high speed.Different from traditional clustering algorithms,the first clustering algorithm we proposed can quickly identify density skeletons,the other algorithm cluster the data based on asymmetric boundary detection.The main innovations are as follows:We firstly propose a density calculation model based on k-nearest neighbors,which is applicable to both low-dimension and high-dimension datasets different from traditional density calculation methods.Then we propose a rapid recognition of density skeletons through the consistency of neighbors and the relationship between the local density of neighboring points.Because the density skeleton is obtained by calculating the k-nearest neighbors only once,the efficiency of the density skeleton recognition technique is improved.We propose an ECLUB algorithm based on the above model and technique.Experimental results from different dimension and number of datasets demonstrate the effectiveness and high efficiency of this algorithm on high-dimensional.In particular,the accuracy of the algorithm on people image data is shown on the Olivetti face dataset.According to the characteristics that the boundary points and noise deviate from the symmetry center and the spatial distribution is uneven,a boundary degree model was proposed,which describes the possibility that high-dimensional spatial statistical data is marked as boundary points or outliers.The boundary degree is obtained by calculating the product of the deviation and obliquity of the statistical object.Thereby the data points are divided into internal points and external points,and an adjacency matrix is constructed.We proposed a clustering algorithm based on asymmetric boundary detection via the boundary degree model.The breadth-first search algorithm is used to traverse the adjacency matrix to get the final clustering result.Experiments show that the algorithm can effectively identify clusters when the data sets contain different densities,complex shapes,noise and "interference lines".Compared with similar algorithms,the algorithm can acquire higher accuracy.
Keywords/Search Tags:clustering algorithm, high-dimensional data, density skeleton, k-nearest neighbors, boundary degree
PDF Full Text Request
Related items