Y-means: A dynamic clustering algorithm

Posted on:2004-11-14

Degree:M.C.S

Type:Thesis

University:University of New Brunswick (Canada)

Candidate:Guan, Yu

Full Text:PDF

GTID:2468390011473019

Subject:Computer Science

Abstract/Summary:

This thesis introduces a new classification algorithm, called Y-means. It is an unsupervised clustering algorithm based on K-means.; Classification is a learning process that allows us to find models (or rules) for projecting data onto a number of classes. Clustering is a category of classification methods based on the unsupervised learning. It partitions objects into meaningful clusters so that the objects from the same cluster are similar, while the objects from different clusters are dissimilar.; K-means is a typical clustering method, which is well known due to its simplicity and low time complexity. However, the usability of K-means is limited by three shortcomings: degeneracy, dependency on the number of clusters, and dependency on the initial centroids. Many extensions of K-means have been developed for solving certain problems, and to some extent they overcome some of these shortcomings. Y-means is also an extension of K-means, and it overcomes the three primary shortcomings of K-means. Y-means adjusts the value of k by splitting and merging clusters automatically according to the data distribution.; Y-means obtained a constant classification of the Iris data when the initial number of clusters and the initial centroid were varied. Trained with 101,000 randomly selected KDD-99 data and simulated with 400,000 randomly selected KDD-99 data, Y-means was able to attain an accuracy of 96.38%, a true positive rate of 99.98% and a false positive rate of 7.22%. The comparative analysis of the performances of Y-means, K-means, SOM, SVM and MLP shows that Y-means is a competitive unsupervised clustering algorithm.

Keywords/Search Tags:

Y-means, Clustering, Algorithm, K-means, Unsupervised, Classification

Related items

1	Implementation of the Unsupervised Possibility Fuzzy C-Means algorithm for classification of hyper spectral data
2	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
3	Study Of Chinese Text Clustering On Improved K-means Algorithm
4	Based On K-means The Chinese Text Clustering Algorithm
5	Improvent Of K-means Clustering Algorithm And Its Application
6	The Research On Parallel Computing Technology In Precise Agricultural Climate Division
7	Research On The Improvement Of C-means Clustering Algorithm
8	Probabilistic K-means Models Via Nonlinear Programming
9	The Improvement On The Fuzzy C-means Algorithm
10	Application Of Unsupervised Clustering Algorithm To Emitter Signals Analysis