Font Size: a A A

The Utilization Of The Supervised Information In Graph-based Learning

Posted on:2013-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:P HeFull Text:PDF
GTID:1268330422452720Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Graph-based learning has attracted more and more interest from the researcher in recent years. Itnot only has mathematical support from the graph theory, but also indicates the key to improve theperformance of the existing learning algorithms that the connection among data points is as importantas the data points. The supervised information is critical information resource as well as ultimatelearning target for supervised learning and semi-supervised learning in machine learning. In this paper,the supervised information in graph-based learning is partitioned into two categories, including vertexconstraints and edge constraints. The supervised classification and the semi-supervised clustering areviewed as vertex-constrained and edge-constrained learning problems respectively. We will putemphasis on the discussion of the utilization of the supervised information by these two groups oflearning algorithms.Traditional supervised classification algorithms prefer the attribute-based approaches in estimatingthe underlying functions that map the attributes to the class labels, which only extracts the informationabout vertices. Although the kernel method later introduces the pairwise relationship of the data points,it only computes with the conditional attributes, but has not extended the supervised information fromthe vertices to the edges. Similarly, most of the existing semi-supervised clustering algorithms bias thesolution space or learn an appropriate metric to make the clustering result as consistent with theprovided constraints as possible. However, they also fail to extend the superved information from theedges to the vertices for full utilization. This paper proposes to transfer the supervised informationthrough vertices and edges on graph, so that the utilization of the supervised information in these twogroups of learning problems can be improved.First, we propose a unified framework for nonlinear classifier design called Manifold MappingMachine (M3). It is composed of three stages, supervised manifold mapping, classifier constructionand out-of-sample extension. As a ‘vertex–edge-vertex’ approach, S3C integrates the classrelationship of the vertices into the weight of their edges (vertex-edge), and then transforms thedifferent classes of data into the low-dimensional target manifold separately (edge-vertex), whichsimplifies the subsequet classifier construction. In order to achieve a similar effect in the spacetransformation of the test data, we construct a ’’bridge’’ between the original manifold and the targetmanifold. By minimizing the difference of the test data mapped from the two manifolds on theintermediate ’’bridge’’, the optimal mapping of the test data can be determined uniquely. To prove the feasibility and the generalization ability of M3, we also discuss its connections with severalwell-known manifold learning algorithms as well.Second, we present a nonlinear classifier under the framework of M3, named Supervised SpectralSpace Classifier (S3C). It integrates the supervised information linearly to map the input data into thelow-dimensional supervised spectral space. Then S3C adopts three different classification algorithmsfor classifier construction. During the out-of-sample extension state, we prove that the optimalmapping of the test data derived by S3C through the manifold “bridge” has the same form as thatderived from the Nystr m Approximation method. Sufficient experimental results show that S3C issignificantly superior to other state-of-the-art nonlinear classifiers on both synthetic and real-worlddata sets.Last but not least, we propose a semi-supervised clustering algorithm named SCRAWL, short forSemi-supervised Clustering via RAndom WaLk, to deal with multi-class semi-supervised clusteringproblems given both must-link and cannot-link constraints. As an ‘edge-vertex-edge’ approach,SCRAWL first determines the propagation range of each constrained vertex (edge-vertex), followedby the expansion of the constraint influence according to the similarities of the vertices connected bythe unconstrained edges to the constrained vertices (vertex-edge). We define the propagation range ofeach constrained vertex and the degrees of its impact as an intermediate structure between thefine-grained vertex and the coarse-grained cluster, called “component”. By estimating propagationaccuracy of each component, SCRAWL can also adjust the strength of the constraint propagation overthe different components, so that the components with higher confidence can receive greater influencefrom the propagated constraints, while the components with lower confidence can only receive lessinfluence by contrast. The experiments on UCI database, text documents, handwritten digits,alphabetic characters, face recognition and image segmentations demonstrate the effectiveness andefficiency of SCRAWL.
Keywords/Search Tags:Graph-based Learning, Supervised Classification, Semi-supervised Clustering, ManifoldMapping, Spectral Method
PDF Full Text Request
Related items