Font Size: a A A

Bayesian Decision Theoretical Framework for Clustering

Posted on:2012-10-21Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Chen, MoFull Text:PDF
GTID:2468390011468960Subject:Information Technology
Abstract/Summary:
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize.;We prove that the spectral clustering (to be specific, the normalized cut) algorithm can be derived from this framework. Especially, it can be shown that the normalized cut is a nonparametric clustering method which adopts a kernel density estimator as its density model and tries to minimize the expected classification error or Bayes risk.;By the Bayesian decision theoretical view, we propose several extensions of current popular graph based methods. Several data-dependent graph construction approaches are proposed by adopting more flexible density estimators. The advantage of these approaches is that the parameters for constructing the graph can be estimated from the data. The constructed graph explores the intrinsic distribution of the data. As a result, the algorithm is more robust. It can obtain good performance constantly across different data sets. Using the flexible density models can result in directed graphs which cannot be handled by traditional graph partitioning algorithms. To tackle this problem, we propose general algorithms for graph partitioning, which can deal with both undirected and directed graphs in a unified way.
Keywords/Search Tags:Bayesian decision, Clustering, Framework, Graph, Data
Related items