Font Size: a A A

Study On Manifold-based Semi-Supervised Classification

Posted on:2010-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X ZhaoFull Text:PDF
GTID:1118360302479574Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Semi-supervised learning, especially semi-supervised methods for clustering, is a popular issue in the machine learning field. Nonlinear dimensionality reduction based on manifold learning and Reproduce Kernel Hilbert Space are all important topics in this field. This thesis concerns about nonlinear dimensionality reduction for data classification, pair-wise constraints-based kernel learning and also the method of clustering. There are two pieces of work we address: dimensionality reduction based on class-probability and kernel learning based on probabilistic pairwise constraints. These two parts are interpenetrable, and kernel learning method can also be used for dimensionality reduction.1. Nonlinear dimensionality reduction for data classification.(a). We propose a novel nonlinear dimensionality reduction algorithm based on class-probability. The algorithm called PLLE designs a distance metric from the class-probability. Different from Euclidean distance or manifold geodesic distance, this metric keeps not only the speciality of Euclidean distance but also the character of elements' class labels. Thus, this metric is much more suitable for the whole data set and testing set than those fitted to the training set only. PLLE absorbs the idea of classical nonlinear dimensionality reduction method LLE, and has its own speciality of semi-supervised classification. It perfectly deals with the difficulty that introducing supervised information into manifold learning methods.(b). The prediction of class-probability is the key part of PLLE. The thesis proposes a PE algorithm to evaluate the class-probability vector. It comes from the idea of Logistic Discriminant method. PLLEc, which is a combination of PLLE and PE, is shown as an effective supervised classifier through numerical experiments.(c). We propose a semi-supervised dimensionality reduction method called PLE. PLE is a class-probability algorithm based on Laplacian Eigenmap by adopting the idea of class-probability prediction. This method can be used for data clustering. The class-probability can be obtained by either PE or the prediction based on kernel learning methods. 2. Kernel Learning based on pairwise constraints.(a). Many dimensionality reduction algorithms seek to find the best linear projection for data sets with pairwise constraints, but these methods are highly sensitive to the distribution of data. To address this problem, a novel method is proposed to determine a probability function or class-probability vector for classification. It is based on the kernel matrix from PCP. We combine it with PLE, and get classification method called PCP-PLE. Taking the number of dimensions and classes into acount, an update version called PCP-PLE* is raised. There is an implicit mapping designed for classification in these two algorithms. Thus they can achieve a better dimensionality reduction goal for data under various distributions. The numerical results show that PCP-PLE* performs better, compared with some of the latest algorithms on the same scenario.(b). Pairwise Constraint Propagation (PCP), which we analyze in detail, has some limitations as a kernel learning method. We find that sometimes the Normalized Mutual Information may not improve consistently with the increasing constraints, when kernel K-means is used for clustering. This means the performance of PCP is quite relevant to the distribution of pairwise constraints. So we raise a PPCP model which relaxes the pairwise constraints by probabilistic constraints. PPCP may be more effective than PCP in some cases. Further more, based on the probability function designed above, PPCP can be used without any prior pairwise constraint knowledge. Therefore, it can be applied in more cases as an unsupervised clustering algorithm.(c). Based on the probability function, we propose a type of active kernel learning algorithms: active-PCP and active-PPCP. The algorithms can adaptive search the negative pairwise constrains for classification, and then deal with them by elimination or relaxation, which can improve the performance of classification. Apart from these, we propose an approach to enlarge the set of pairwise constraints automatically which makes the algorithm more suitable for classification.There are six chapters in this thesis. Chapter 1 states the basic concepts, background knowledge and development of the issue, and also lists the production of the thesis. Chapter 2 gives a brief review on the work of manifold learning and kernel methods. The main algorithms are illustrated in Chapters 3,4, and 5. Conclusion remarks and future perspectives are shown in the last chapter.
Keywords/Search Tags:Manifold Learning, Semi-Supervised Learning, Classification, Dimension Reduction, Kernel method
PDF Full Text Request
Related items