Font Size: a A A

PG-means: Learning the number of clusters in data

Posted on:2008-12-20Degree:M.SType:Thesis
University:Baylor UniversityCandidate:Feng, YuFull Text:PDF
GTID:2448390005470696Subject:Artificial Intelligence
Abstract/Summary:
We present a novel algorithm called PG-means in this thesis. This algorithm is able to determine the number of clusters in a classical Gaussian mixture model automatically. PG-means uses efficient statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we apply a statistical test to the entire model at once, not just on a per-cluster basis. We show that this method works well in difficult cases such as overlapping clusters, eccentric clusters and high dimensional clusters. PG-means also works well on non-Gaussian clusters and many true clusters. Further, the new approach provides a much more stable estimate of the number of clusters than current methods.
Keywords/Search Tags:Clusters, Pg-means
Related items