Font Size: a A A

Metric Learning And Clustering Algorithm Based On Transitive Distance

Posted on:2019-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:T C DaiFull Text:PDF
GTID:2428330545970004Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the core problems of data mining,it is often used to group datasets according to certain rules or requirements.The research and development of clustering algorithms mainly depend on two aspects:metric learning and algorithm.Spectral clustering algorithm and hierarchical clustering algorithm are two typical algorithms.Although,both of them can make a good division of data,but in the actual study,we found that these two types of algorithms have some flaws.Spectral clustering algorithm is plagued by the chosen of scale factors during metering process.Meanwhile,the computational complexity is too large in the process of feature decomposition during clustering procedure.The hierarchical clustering algorithm has the problem that can not deal with complex-shaped data.Therefore,based on transitive distance,kernel trick by transitive distance and K-means duality,in this paper,the above problems are studied in depth.The main work of this article includes:1.Spectral Clustering Algorithm Based on Transitive DistanceHere are two main problems in the field of spectral clustering.The clustering performance of spectral clustering algorithms often influences by the mesoscale factors of metrics.Similarity measured by Euclidean distance is not always accurate.In this paper,transitive distance plays the role of metrics,and a corresponding spectral clustering algorithm based on this metric is proposed.The main idea contains three steps.First,a minimum spanning tree is constructed to do a similarity transformation.The output of the first step is a transfer matrix.Second,we construct a Laplacian matrix by the transfer matrix of the first step.Data is projected into the eigen-space of this Laplacian matrix.Third,we do the clustering in the space of the second step.The experimental results on artificial data sets and UCI data sets show that the Spectral clustering algorithm based on the transitive distance has good robustness and performance.2.K-means Duality Algorithm Based on Kernel Trick By Transitive DistanceSpectral clustering algorithm can get better clustering results when dealing with multiscale and complex shape data.However,this algorithm has a large amount of computation.The feature decomposition may leading to a high time complexity such as O(n3).In this case,we use the kernel trick to map sample data into a new space by transitive distance and use the K-means duality to directly cluster the mapped data in the new space.The experimental result on artificial data sets and UCI data sets show that the K-means duality algorithm based on kernel trick by transitive distance has good robustness and effectiveness.3.Hierarchical Clustering Algorithm Based on Transitive DistanceThe merge method in the hierarchical clustering algorithm is the closest to the basic principle of the clustering algorithm.However,when the sample data has a complex shape or has a large number of outlier data,the hierarchical clustering algorithm cannot obtain the correct aggregation class results.In this situation,we apply the principle of distance transfer to hierarchical clustering algorithms to establish a transfer matrix for sample data,and then merge the same categories in the transfer matrix to complete the clustering of sample data.The experimental result on complex-shaped data sets show that the hierarchical clustering algorithm based on transitive distance has good robustness and effectiveness.
Keywords/Search Tags:Clustering algorithm, Metric learning, Transitive distance, Gaussian metrics, Kernel trick by transitive distance, K-means duality
PDF Full Text Request
Related items