Font Size: a A A

Data clustering using statistical physics: Fundamentals, extensions, and applications

Posted on:2003-07-26Degree:Ph.DType:Dissertation
University:University of Colorado at Colorado SpringsCandidate:Kokkonen, Kim RolfFull Text:PDF
GTID:1468390011483123Subject:Computer Science
Abstract/Summary:
Data clustering methods based on the principles of statistical physics have become popular in recent years. The deterministic annealing (DA) algorithm and its enhancements have been used successfully in clustering and related tasks. However, a limitation of DA is that it finds only an approximation to the global optimum of the clustering objective function, with no known bound on its accuracy.; Using a pseudophysical model and a derivation based on pure statistical mechanics, we show that there is a unique closed-form clustering solution that achieves the global optimum. Although this solution is impractical to compute, its existence is used to prove that the deterministic melting algorithm follows the global optimum at all temperatures. This claim is supported by experiments with both simulated annealing and brute-force enumeration of local minima.; Although melting is a practical algorithm, it has significantly higher computational complexity than annealing and the requisite ground state is not always known. This motivates a new algorithm called deterministic tempering (DT) that has the same computational complexity as DA while approaching the global optimality of melting. DT is analogous to the physical process of tempering, which is characterized by repeated temperature cycles across carefully calibrated ranges, but DT automatically learns the number of cycles and the temperature ranges during operation. DT is demonstrated to follow the global optimum in cases where DA does not, while also avoiding the stochastic embellishments that have recently been used to improve DA's performance.; The physically motivated clustering methods are next extended by developing a new method of selecting the best number of clusters. We also show that the Euclidean distortion-based clustering objective fails to provide useful results for pattern recognition applications where the clusters have high ellipticity, and consider several methods of solving this problem. We evaluate preclustering—the use of a fast and simple approximate clustering method—to reduce the number of data points supplied to the slower but more accurate physical clustering methods.; Finally, these clustering methods are applied to textured image segmentation, with successful results and advantages shown for each of the new techniques.
Keywords/Search Tags:Clustering, Statistical, Global optimum
Related items