Font Size: a A A

The Study Of Manifold Learning Algorithms And Their Applications

Posted on:2012-09-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y K LeiFull Text:PDF
GTID:1118330335962430Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Manifold learning is a new kind of nonlinear dimensionality reduction method for finding low-dimensional compact representations of high-dimensional observation data and exploring the inherent law and intrinsic structue of data. At present, manifold learning has become a hot issue in the fields of data mining, pattern recognition, machine learning, and other related research topics. These manifold learning methods do yield impressive results on some artificial and real world benchmark data sets due to their nonlinear nature, geometric intuition, and computational feasibility. However, the original manifold learning methods still show some common problems, such as out-of-sample, supervised learning, large-scale manifold learning, and so on. In order to overcome these problems, this dissertation carries out a series of researches on algorithm design and their applications in image and protein-protein interactions (PPI) data. Firstly, classical manifold learning methods were analyzed and compared in detail. Secondly, five problems about manifold learning were mainly investigated, which include out-of-sample, supervised learning, characterization of local manifold geometry, construction of globally regularized linear regression model, and large-scale manifold learning. Finally, in this dissertation we proposed three manifold learning algorithms. Our proposed algorithms were compared with the related researches in theories and experiments. And the results demonstrate the effectiveness of our proposed algorithms.The main work for this dissertation can be summarized as follows:(1)Based on the analyses of local spline embedding (LSE) method, we proposed an efficient feature extraction algorithm called orthogonal local spline discriminant projection (O-LSDP). By introducing an explicit linear mapping, constructing different translation and rescaling models for different classes as well as orthogonalizing feature subspace, O-LSDP can effectively circumvent the two major shortcomings of the original LSE algorithm, i.e., out-of-sample and unsupervised learning. O-LSDP not only inherits the advantages of LSE which uses local tangent space as a representation of the local geometry so as to preserve the local structure, but also makes full use of class information and orthogonal subspace to significantly improve discriminant power. Extensive experiments on standard face databases verify the feasibility and effectiveness of the proposed algorithm. (2)A new manifold learning algorithm called local multidimensional scaling regression embedding (LMDSRE) was developed under the conceptual framework of compatible mapping. LMDSRE is to use local multidimensional scaling (LMDS) to construct the local coordinates of each data point and its neighbors for representing local geometry structure of low-dimensional manifold. Regularized linear regression models are then fitted to map each of the local coordinates to its own single low-dimensional global coordinate. LMDSRE as a new manifold learning method has the characteristic of local isometry and can be used for nonlinear dimensionality reduction and data visualization analysis. The experiments on six toy datasets and three real-world datasets illustrate the validity of our method.(3)For the high complexity problem of the ISOMAP algorithm, we designed a new fast isometric feature mapping (Fast-ISOMAP) method. Fast-ISOMAP is to utilize minimum set cover strategy to designate p ones among all data points to be landmark points. Instead of computing Dn×n, we only computed the p×n matrix D p×n of distances from each data point to the landmark points in constructing the shortest path distance matrix. Landmark MDS is then applied to embed all data points to low-dimensional feature subspace. It was found in experiments that Fast-ISOMAP can greatly improve the computational efficiency of the original ISOMAP and be used in large-scale manifold learning problems under the condition that it does not significantly change the performance of ISOMAP. Experimental results on many artificial benchmark datasets show the effectiveness of our proposed algorithm.(4)We developed a robust computational technique for assessing the reliability of protein interactions and predicting new protein interactions by fast manifold embedding algorithm. Firstly, we adopted low-dimensional manifold modeling to fit a PPI network and utilized our proposed fast isometric feature mapping (Fast-ISOMAP) to transform a PPI network into a low dimensional metric space, which recasts the problem of assessing and predicting protein interactions into the form of measuring similarity between the points located in its metric space. Then a reliability index (RI), a likelihood indicating the interaction of two proteins, is assigned to each protein pair in the PPI networks based on the similarity between the points in the embedding space. The performance of the proposed approach is evaluated by using functional homogeneity and localization coherence of protein interactions from three PPI datasets that are derived from various scales and high-throughput techniques, i.e., yeast-two-hybrid (Y2H), tandem affinity purification (TAP), and mass spectrometry (MS). Experimental results demonstrate that the interactions ranked top by our method have high functional homogeneity and localization coherence. Our proposed method can effectively and efficiently overcome the disadvantages that most existing methods require additional prior information and are sensitive to the sparseness of PPI network. Moreover, to our knowledge, our proposed method is the first research work aiming at utilizing manifold learning theory to assess and predict protein interactions. Therefore, the proposed algorithm is a much more promising method to detect both false positive and false negative interactions in PPI networks.
Keywords/Search Tags:Manifold learning, Dimensionality reduction, Orthogonal local spline discriminant projection, Local multidimensional scaling regression embedding, Fast isometric feature mapping, Face recognition, Data visualization, Protein-protein interactions
PDF Full Text Request
Related items