Font Size: a A A

Gaussian Process Based Dimensionality Reduction Models

Posted on:2013-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W JiangFull Text:PDF
GTID:1228330392455482Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this era of information explosion, with the rapid development of science and technol-ogy, especially the computer technology, many high-dimensional data have been emergingin almost every area, and these data are often non-linearly distributed. How to extract usefulinformation from these data is becoming increasingly important. Many data analysis tech-niques are based on dimensionality reduction (DR) models. Moreover, due to the curse ofdimensionality, DR has also become an essential step in the task of data mining. Accordingto making use of supervised label information or not, all the DR techniques can be dividedinto the unsupervised models and the supervised models. Principal Component Analysis(PCA) is a classical unsupervised DR technique which has been widely used in many real-world data analysis tasks. However, the traditional unsupervised DR methods often cannotmake use of extra label information (real values for the regression tasks or discrete valuesfor classification tasks), which may be provided in some data sets, or may be able to easilybe obtained based on some important intrinsic property of the data sets (such as the face anddigital characters data sets). The waste of valuable information is unwise. Lots of studieshave shown that the extra label information can help us to improve the performance of DRmodels. Linear Discriminant Analysis (LDA) model is the best example. Thus the idea ofconsidering the extra information for DR models has driven the development of the super-vised DR models, which have shown good performance in some tasks. However, there aremany drawbacks for these models, such as the inability of effectively using the labeled data,high computational complexity, and so on.This article focuses on the DR models in Bayesian settings, and proposes a novel unsu-pervised plus two supervised DR models. Some simulated and real-world data sets are usedto testify the three models, and the results tell us that the novel DR models perform well inmany DR tasks. The main content of this article can be summarized as follows:1) Review the common DR models. As there are a large number of DR models, this ar-ticle firstly introduces them by separating them into the unsupervised models and the super- vised approaches. Furthermore, in each class all the models also have been divided into thespectral analysis based and the latent variable model (LVM) based DR algorithms accordingto whether the relationship between the low-dimensional space and the high-dimensionalobservation space is explicitly modeled.1-3representative models have been reviewed indetail for each class, and then more related works can be summarized. In addition, a briefanalysis, including the pros and cons for each model type is provided.2) Propose a novel unsupervised DR model and its two extensions. This new model isa LVM-based model which explicitly uses thin-plate spline function to model the nonlinearrelationship between the latent variables and observation variables, so it can make use ofsome special properties of thin-plate spline model, giving rise to the fact that the new DRmodel is particularly suitable for data sets where latent dimensionality is low, and/or thereexist some rotation and translation structure. Furthermore, the model can also be viewed asthe classical Gaussian process latent variable model (GPLVM) with special kernel function.Two extensions, including the dynamics and back-constrains have been incorporated intothe new LVM-based model, which can provide better performance in specific tasks.3) Propose a new supervised DR model. Based on the analysis of existing supervisedLVM-based DR models, a new supervised extension of GPLVM is proposed, which can beinterpreted not only from the prospective of LVM, but also from the view of semi-parametricmodel in the classical supervised DR framework. The new model improves the performanceof DR and also reduce the complexity of the algorithm. Besides, a extension of the proposedsupervised DR model also has been given, which shows better performance when the num-ber of training data is large.4) Propose a novel gradients learning model. This model can be seen as an indirectsupervised DR model based on spectral analysis. The new gradients learning model directlyextends the existing gradients learning model to the Bayesian statistical framework basedon Gaussian process (GP), which can provide the error bar of estimation that originallearning gradients model does not provide, and has higher accuracy of gradients estimation.
Keywords/Search Tags:Gaussian Process, Dimensionality Reduction, Latent Variable Model, Thin-plate Spline, Gradients Learning
PDF Full Text Request
Related items