Font Size: a A A

Research On Topology Relation-based Distance Metric And Clustering Algorithms

Posted on:2018-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:J Y GuangFull Text:PDF
GTID:2348330536987924Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis, which is an important part of machine learning, has attracted much attention to study. In cluster analysis, distance metric is an important factor that affects the accuracy of algorithms.In traditional clustering algorithms, Euclidean distance is often used to measure the similarity between two samples and divide the sample sets. Although Euclidean distance is easy to understand and imple-ment, it assumes that the input space is isotropic. However, the assumption of isotropy is too harsh but can't always be guaranteed. In addition, Euclidean distance only considers the similarity between the two samples, while ignoring the information of all other samples. In this paper, we propose two kind of new distance metrics that can be used to discover the topological relationships among samples, and our new methods doesn't require the input space is isotropic, that is said that distance between two samples can be unequal. The main innovation and work of this paper are summarized as followsFirst, a new effective distance metric based on sparse reconstruction is proposed. In our method, we evaluate the similarity between two samples by using not only the distance between these two samples,but also distances between one specific sample and all the other related ones.Sparse reconstruction coef-ficients are employed to reflect such global relationship among samples. Then, we develop four effective distance-based clustering algorithms by applying the effective distance to three classical clustering algo-rithms, i.e., K-means, K-medoids, FCM and spectral clustering algorithms, respectively. Experimental results on UCI Benchmark datasets demonstrate the efficacy of our proposed methods.Second, a novel spectral clustering method with mixed Euclidean and Kendall Tau metrics is pro-posed. By our method, similarity between pairs of samples and their neighbors are both considered for learning the underlying structure of the dataset. Specifically, the new similarity metric is a fusion algorithm, which outputs enhanced metric by combining multiple metrics; i.e., Euclidean metric and Kendall Tau metric. Moreover, the proposed method utilizes the non-linear fusion of different similarity metrics to tackle the dataset from different aspects; and thus can effectively utilize different informa-tion from the data structure. Experimental study on various datasets demonstrates that the proposed approach achieves superior performance to conventional methods.Experimental study on various datasets demonstrates that our proposed two new distance metrics are effective and can improve the accuracy of algorithms.
Keywords/Search Tags:Clustering analysis, distance metric, effective distance, Kendall Tau distance, similarity fusion
PDF Full Text Request
Related items