Font Size: a A A

Studies On Pairwise Constraint Propagation And Its Applications In The Constrained Clustering Tasks

Posted on:2013-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y FuFull Text:PDF
GTID:1118330362467366Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the research area of machine learning, pattern recognition and data mining etc., clusteringanalysis is one of the most important data analysis methods. The clustering analysis methodsare widely used in various fields, such as image analysis, bioinformatics, web data analysis, socialnetwork analysis and astronomy and so on. With the clustering analysis methods, people have beengiven a much wider understanding of the complex and large-scale data from these research fields.At the same time, the results of the clustering analysis is the basis of the further data processing.Therefore, it is very important to do the research about the data clustering analysis methods.However, it is well known that traditional clustering analysis is an ill-posed problem. It mayall be due to that the traditional clustering algorithms mainly work in a completely unsupervisedsetting and do not take any prior knowledge into consideration. Therefore, some supervised in-formation is expected to be exploited to guide the clustering analysis. Pairwise constraints amongthe data points are such supervised information, which includes the must-link pairwise constraintsthat specifies two objects being the same class, and the cannot-link pairwise constraints that spec-ifies two objects being in different classes. The clustering analysis with the pairwise constraints isnamed as the constrained clustering problem with the pairwise constraints.In the constrained clustering problem based on pairwise constraints, we usually suffer fromthe scarcity of pairwise constraints. The so-called pairwise constraint propagation problem is topropagated the initial pairwise constraints among the entire dataset. Through the pairwise con-straint propagation, we can obtain plenty of such supervised pairwise information. With these veryvaluable information, we can improve the performance of the constrained clustering tasks. Due tothe scarcity of the initial pairwise constraints, therefore the constraint propagation problem has thesemi-supervised learning structure, which means we may exploit the large amount unsuperviseddata to help perform the constraint propagation tasks. Several approaches have been proposed inthe past to deal with the challenging problem of pairwise constraint propagation. However, thesecurrent approaches have some serious limitation. For example, the pairwise constraint relation-ship between data points should be symmetric and the current constraint propagation methods donot consider this kind of symmetric property. Moreover, the constraint propagation process needs to first construct a similarity graph and the current graph construction manner does not incorpo-rate the supervised information in the pairwise constraints. In addition, the proposed constraintpropagation algorithms can only deal with the single-modal data, but not the multi-modal data,which exist in great numbers in the reality. How to sufficiently exploit the relationship of each datamodality to finish the pairwise constraint propagation is a very important research problem.To solve the current problems existed in the previous works, we deeply study the problemof pairwise constraint propagation and its application in the constrained clustering tasks, the mainworks and novelties are listed as follows:(1) We first consider the symmetric property in the problem of pairwise constraint propaga-tion and study the constraint propagation with the symmetric structure. We propose a symmetricgraph regularized pairwise constraint propagation algorithm. Under the framework of the symmet-ric graph regularization, we show that pairwise constraint propagation is equivalent to solving aLyapunov matrix equation. Moreover, we also model the pairwise constraint propagation problemas a dynamic and symmetric information spreading process. We prove that such symmetric infor-mation spreading process has the time-invariant spreading results and show that this time-invariantresults are also equivalent to solving the same Lyapunov matrix equation. Our research connectsthe symmetric pairwise constraint propagation with the Lyaponuv equation, which is widely usedin the area of control theory. The experimental results show that the symmetric graph regularizedconstraint propagation method can achieve very good constrained clustering performance.(2) The pairwise constraint propagation algorithms based on the semi-supervised learningframework need to first construct a similarity graph on the dataset, which is used to reflect the sim-ilarities between data points. We propose a similarity learning method for the pairwise constraintpropagation tasks. Our proposed method simultaneously considers to minimize the local recon-struction error and the local constraint error, which can be formulated as a quadratic optimizationproblem. We further apply the learned data similarities into the task of the pairwise constraintpropagation, of which the basic idea is to consider the two data points with higher learned similar-ity should have the same constraint settings with respect to the other data points. The experimentalresults illustrate that with the learned data similarities, the constraint propagation algorithm canimprove the clustering performance.(3) To the pairwise constraint propagation problem on multi-modal dataset, we propose amulti-modal constraint propagation method based on the multi-graph random walk process. Ourmethod first construct the similarity graphs on each data modality, then define the natural randomwalk process on each similarity graph, apply the multi-graph random walk process to combinemultiple similarity graphs to form the multi-graph structure, and then adopt the multi-graph la-bel propagation technique to solve the multi-modal constraint propagation problem. We furtherpropose a method to automatically learn the combination weights of the multiple graphs. Our pro- posed multi-modal pairwise constraint propagation method can be formulated as an unconstrainedquadratic optimization problem, which has a closed-form solution. The experimental results showthat our proposed the multi-graph random walk process based multi-modal constraint propagationcan effectively exploit the information for all of the data modalities and dramatically outperformthe single-modal methods.(4) We further propose a novel modalities consensus framework and apply it on the problemof multi-modal pairwise constraint propagation. Our proposed modalities consensus frameworkuses a consensus regularization item to force the constraint propagation results on multiple datamodalities to be consistent each other. For efficiently solving the modalities consensus framework,we adopt a separable modalities consensus regularizer. With this kind of separable modalities con-sensus regularizer, multi-modal constraint propagation problem can be effectively solved with analternating optimization way, which includes the independent constraint propagation process oneach data modality and the consensus constraint propagation process among all of the data modal-ities. With our modalities consensus framework, two single-modal constraint propagation algo-rithms can be reformulated as two well-defined multi-modal solutions for the problem of pairwiseconstraint propagation on the multi-modal dataset. The experimental results show that the multi-modal constraint propagation method based on the modalities consensus framework can achievethe best clustering performance compared to the previous methods.
Keywords/Search Tags:pairwise constraint propagation, constrained clustering, semi-supervisedlearning, multi-modal analysis
PDF Full Text Request
Related items