Font Size: a A A

Probabilistic model-based clustering of complex data

Posted on:2004-10-06Degree:Ph.DType:Dissertation
University:The University of Texas at AustinCandidate:Zhong, ShiFull Text:PDF
GTID:1468390011475884Subject:Engineering
Abstract/Summary:
In many emerging data mining applications, one needs to cluster complex data such as very high-dimensional sparse text documents and continuous or discrete time sequences. Probabilistic model-based clustering techniques have shown promising results in many such applications. For real-valued low-dimensional vector data, Gaussian models have been frequently used. For very high-dimensional vector and non-vector data, model-based clustering is a natural choice when it is difficult to extract good features or identify an appropriate measure of similarity between pairs of data objects.; This dissertation presents a unified framework for model-based clustering based on a bipartite graph view of data and models. The framework includes an information-theoretic analysis of model-based partitional clustering from a deter ministic annealing point of view and a view of model-based hierarchical clustering that leads to several useful extensions. The framework is used to develop two new variations of model-based clustering—a balanced model-based partitional clustering algorithm that produces clusters of comparable sizes and a hybrid model-based clustering approach that combines the advantages of partitional and hierarchical model-based algorithms.; I apply the framework and new clustering algorithms to cluster several distinct types of complex data, ranging from arbitrary-shaped 2-D synthetic data to high dimensional documents, EEG time series, and gene expression time sequences. The empirical results demonstrate the usefulness of the scalable, balanced model-based clustering algorithms, as well as the benefits of the hybrid model-based clustering approach. They also showcase the generality of the proposed clustering framework.
Keywords/Search Tags:Model-based clustering, Data, Complex, Framework
Related items