Font Size: a A A

With/Without Constraints Exemplar-based Clustering And Transfer Learning

Posted on:2017-01-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:A Q BiFull Text:PDF
GTID:1108330488980608Subject:Light Industry Information Technology
Abstract/Summary:PDF Full Text Request
Artificial intelligence, especially machine learning, has deeply affect our life. As a typical unsupervised learning model, researchers always focus on the study of clustering, and have gained numerous achievements. Furthermore, in some scenarios, we need to choose cluster-centers from actual data points in the process of clustering, we call these actual data points as exemplars, and these algorithms as exemplar-based clustering algorithms. On the other hand, there are many new scenarios in the field of machine learning, such as data stream, big data, link constraints, transfer learning, etc.To explore the prospect of exemplar-based clustering algorithms, this study focus on those new scenarios in machine learning above. Specially, we improve the clustering models to solve these new problems. Thus, we will brief introduce the details of this study as follows:1. Firstly, on the basis of Bayesian probability framework and the Maximum a Posterior principle, we propose the Bayesian Exemplar-based Clustering Algorithm, BE algorithm for short. BE algorithm, which unifies the objective functions of both Affinity Propagation(AP for short) and Enhanced α-Expansion Move(EEM for short) clustering algorithms, is the start line of this study. As Guassian Mixture Model is capable of approximating the probability density of any shape, we define the prior probability of exemplar set and the probability between data point and its exemplar according to its probability density function. By introducing the Bayesian probability framework, we expand the research strategy of exemplar-based clustering algorithms.2. Secondly, on the basis of BE algorithm, we propose an effect Probability Drifting Dynamic α-Expansion clustering algorithm, PDDE algorithm for short. PDDE algorithm embeds the similarity between previous data points and current data points into the target function, and makes the exemplars of previous data points and current data points as closely as possible. On the other hand, this framework is capable of dealing with two kinds of similarities between previous data points and current data points, namely whether current data points share some points with previous data points or not. In conclusion, PDDE algorithm works well in dealing with clustering data streams.3. Thirdly, this study proposes a new incremental exemplar-based clustering algorithm called Incremental Enhanced α-Expansion Move algorithm, IEEM for short. IEEM algorithm processes large data chunk by chunk. More specially, IEEM algorithm revises the exemplars in the procedure of iteration accordingly, and obtains the clustering result for the entire dataset when the last data chunk has been processed. What’s more, the optimization procedure of EEM and its theoretical spirit can be easily kept in IEEM, and no more efforts are needed to develop a new optimization framework.4. Fourthly, to deal with pairwise link constraints, this study divides the link constraints as loose link constraints and strong link constraints. We naturally integrate EEM clustering algorithm with loose link constraints, and accordingly propose the Bayesian Enhanced α-Expansion Move Clustering algorithm with Loose Link Constraints, BEEMLC for short. Although BEEMLC algorithm directly adds a penalty term about loose link constraints into the objective function, it actually deals with both loose and strong link constraints. Furthermore, to solve the new objective function, we improve the optimization framework utilized in EEM.5. Finally, aims at solving the transfer clustering problem, this study propose a new algorithm called Transfer Affinity Propagation based on Kullback-Leiber distance, TAP_KL for short. By leveraging Kullback-Leiber distance which is usually used in information theory, TAP_KL measures the similarity relationship between source data and target data. Moreover, we can embed this similarity relationship to the calculation of similarity matrix of target data. Thus, the optimization framework of AP can be directly used to optimize the new objective function of TAP_KL. In this case, TAP_KL has built a simple algorithm framework to solve the transfer clustering problem, in which we just need to modify the similarity matrix.
Keywords/Search Tags:exemplar-based clustering algorithms, data stream, incremental algorithms, pairwise link constraints, transfer clustering, α-expansion move
PDF Full Text Request
Related items