Font Size: a A A

Y-means: A dynamic clustering algorithm

Posted on:2004-11-14Degree:M.C.SType:Thesis
University:University of New Brunswick (Canada)Candidate:Guan, YuFull Text:PDF
GTID:2468390011473019Subject:Computer Science
Abstract/Summary:
This thesis introduces a new classification algorithm, called Y-means. It is an unsupervised clustering algorithm based on K-means.; Classification is a learning process that allows us to find models (or rules) for projecting data onto a number of classes. Clustering is a category of classification methods based on the unsupervised learning. It partitions objects into meaningful clusters so that the objects from the same cluster are similar, while the objects from different clusters are dissimilar.; K-means is a typical clustering method, which is well known due to its simplicity and low time complexity. However, the usability of K-means is limited by three shortcomings: degeneracy, dependency on the number of clusters, and dependency on the initial centroids. Many extensions of K-means have been developed for solving certain problems, and to some extent they overcome some of these shortcomings. Y-means is also an extension of K-means, and it overcomes the three primary shortcomings of K-means. Y-means adjusts the value of k by splitting and merging clusters automatically according to the data distribution.; Y-means obtained a constant classification of the Iris data when the initial number of clusters and the initial centroid were varied. Trained with 101,000 randomly selected KDD-99 data and simulated with 400,000 randomly selected KDD-99 data, Y-means was able to attain an accuracy of 96.38%, a true positive rate of 99.98% and a false positive rate of 7.22%. The comparative analysis of the performances of Y-means, K-means, SOM, SVM and MLP shows that Y-means is a competitive unsupervised clustering algorithm.
Keywords/Search Tags:Y-means, Clustering, Algorithm, K-means, Unsupervised, Classification
Related items