Font Size: a A A

Using The Triangle Inequality To Accelerate Cluster Algorithm

Posted on:2007-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:C X HeFull Text:PDF
GTID:2178360182994148Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis is an important research field in the data mining, it is a research focus how to set up effective cluster algorithm in the face of the large-scale, high-dimensional datasets. Cluster groups the dataset into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. The further optimization of cluster's algorithm not merely facilitates the completion of theory of algorithm, but also facilitates popularization and application of the algorithm.Sequential algorithm does not need to confirm cluster's number ahead of time, and is a kind of very direct and fast cluster algorithm. But when handling the large numbers of data, the efficiency still needs to improve. To this, on the basis of the Two-Threshold Sequential Algorithm Scheme TTSAS, we put forward a new sequential algorithm TI_TTSAS. This algorithm avoids unnecessary distance calculations by applying the triangle inequality. Experiments show that the new algorithm is more effective for datasets of more dimensions, and becomes more and more effective as the number of clusters increases. The results have kept the accuracy of TTSAS algorithm.The triangle inequality principle not only may improve the sequential algorithm, all the algorithms that measure dissimilarity based on Euclidian distance, the redundancy of distance computations may be avoided through the triangle inequality principle, k-means is a popular partition cluster algorithm, the article avoids massive distance computations to save the running time by using the triangle inequality principle, similarly. The experimental result proved that, the improvement to the k-means algorithm is remarkable.
Keywords/Search Tags:triangle inequality, cluster algorithm, sequential algorithm, TTSAS, TI_TTSAS, k-means
PDF Full Text Request
Related items