Font Size: a A A

Efficient representation of cluster structure in large data sets

Posted on:2002-10-23Degree:Ph.DType:Thesis
University:Tufts UniversityCandidate:Kantabutra, SanpawatFull Text:PDF
GTID:2468390011492034Subject:Computer Science
Abstract/Summary:
A clustering is a grouping of similar objects. Clustering has many applications including visualization, pattern recognition, learning theory, computer graphics and web search engines. In this thesis we introduce an innovative clustering algorithm called Ψ-clustering. While most clustering algorithms assume some form of data distribution for the input set in order for the algorithms to work effectively, Ψ-clustering does not make any assumption about the data. This is a great advantage because the data distribution is rarely known in practice. Ψ-clustering also provides greater flexibility by allowing users to choose the degree of similarity between objects that should place them in the same cluster. We also show that these clusters can be represented efficiently so that we can reconstruct them in reasonable time when the clusters are required in the future. We exhibit an algorithm that finds these representatives for clusters.
Keywords/Search Tags:Data
Related items