Font Size: a A A

Research And Applications Of Clustering Algorithms With The Model Selection Ability

Posted on:2015-11-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:S J ZhuFull Text:PDF
GTID:1228330422992742Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Clustering is an important tool in pattern recognition, which has many applications inareas such as bioinformatics, web data analysis, information retrieval, customer relationshipmanagement, text mining, and scientific data exploration. It aims to partition a finite, unla-beled data set into several natural subsets so that data objects within the same clusters areclose to each other and the data objects from different clusters are dissimilar from each otheraccording to the predefined similarity measurement.In cluster analysis, one of the most challenging and difcult problems is the determi-nation of the number of clusters in a data set, which is a basic input parameter for mostclustering algorithms. Although for some applications, users can determine the number ofclusters, K, in terms of their expertise, under more circumstances, the value of K is unknownand needs to be estimated exclusively from the data themselves. It is obvious that the qualityof resulting clusters is largely dependent on the estimation of K. A division with too manyclusters complicates the result, therefore, makes it hard to interpret and analyze, while a di-vision with too few clusters causes the loss of information and misleads the final decision.The procedure of selecting K is called model selection.This thesis studies the clustering algorithm with automatic model selection capabilitiesand their application in the following three aspects:(1) The theory of k-means algorithm is simple, and it is easy to implement. among allthe clustering algorithm, k-means is the most widely used.However, there are two seriousproblems with this algorithm: The cluster number needs to be manually specified and thedead unit problem. Dead unit problem is that if the initial position of a center is far awayfrom the data region, it will never has the opportunity to learn. In order to solve these twoproblems, after years of development, finally we have Rival Penalization Competitive Learn- ing(RPCL). This method can automatically determine the number of clusters in the clus-tering process。And because of this advantage to automatically determine the number ofclusters, we can init the cluster number a little more than the real cluster number, thus ignor-ing the impact caused by dead unit problem. However, RPCL has its own drawbacks. First,some empirical studies have found that its performance is sensitive to the selection of thedelearning rate. If the rate is not well preselected, the RPCL may completely break down.Second, since RPCL is an online learning algorithm, we need to manually specify a learn-ing rate carefully. If the learning rate is too big, the algorithm will be hard to converge. On theother hand, if the learning rate is too small, the algorithm will be slow down. Third, some em-pirical studies have also found that when the given cluster number is far bigger than the truecluster number K, RPCL will fail almost every time. This thesis improves RPCL algorithm inthe following three aspects. First, we introduce a Dirichlet process prior to independent cri-terion, the use of this prior makes our clustering algorithm more polymerizable. Second, weintroduce a method of automatically determining the learning rate, the method determinethe learning rate is proved to be in line with Newton’s descent method. Third, according tothe new independent criterion, we propose a method of automatically determining penaltyrates.(2) Image segmentation is a common pre-process in computer vision, in which theimage is divided into a few compact segments, according to the similarity of features andthe proximity in space. Due to the inherent consistency of image segmentation and clus-tering, clustering-based method is commonly used unsupervised image segmentation al-gorithm.Bayesian Ying-Yang harmony learning(BYY) is proposed by professor Xu Lei fromChinese University of Hong Kong. In this theory, the real world X and its representation Yare expressed by Ying and Yang under the Bayesian framework. And suggested that althoughthese two should be equal representation in theory, but in fact, due to various constraints,they are not equal. The goal of Bayesian Ying-Yang harmony learning is to make the two rep-resentation harmony. In order to fully investigate the effectiveness of BYY based image seg-mentation algorithms, we design an novel method for natural images. We develop a cluster- ing algorithm based on BYY harmony learning theory with Dirichlet-Normal-Wishart prior.Our algorithm updates the parameters with batch rules and eliminating some complex termsthat need a lot of computing resources, thus greatly accelerate the clustering process. Duringthe clustering procedure, this algorithm keeps the strong capability of determine the numberof components automatically, which inherits from BYY systems. Furthermore, we developpost process methods for clustering based image segmentation algorithms. An importantimprovement comparing with other clustering based methods is that we assign the labelswith regard to superpixels, but not pixels. We conduct extensive experiments to comparethe results with human segmentation using the BSDS500. The segmentation results matchvery well with those by humans and are competitive with the state-of-the-art segmentationalgorithms.(3) Face information processing is one of the most important field computer vision.Dueto the wide application of cameras and video surveillance systems, face data are in the rapidgrowth all the time, which in turn stimulates the needs of processing face data automat-ically.After years of research, face information processing has made remarkable improve-ments and great strides. Still, it seems there is a very simple question has no answer: Given aset of face data, how many individuals are included in the data set? To solve the above prob-lem, there are the following three main major difficulties. First, how to extract robust facialfeatures from the face images. Secondly, how to choose a compact and discriminative subsetof face features to distinguish different individuals. Third, how to design a clustering algo-rithm to automatically determine the number of clusters. A novel framework for clusteringfaces has been proposed. Faces can be clustered without knowing the amount of individuals,and discriminative facial features can be selected automatically. Two types of facial features,appearance features and anthropometric facial features, are extracted to represent a per-son. First, a small number of face images are annotated to train an AAM, and determine acompact set of discriminative facial features. Then, the trained AAM will fit to all faces andeach face will be represented by a feature vector. Finally, a Bayesian nonparametric model isemployed to cluster faces. We validate the proposed framework with extensive experiments, results show that this framework is feasible. This framework can be used in face register tolargely reduce the manual work, and also can be used in photo labeling and face carvingclustering.
Keywords/Search Tags:Pattern Recognition, Clustering, Model Selection, Rival Penalized CompetitiveLearning, Bayesian Ying-Yang Harmony Learning, Image Segmentation, Bayesian Non-parametric Methods
PDF Full Text Request
Related items