Font Size: a A A

Research On Generalized Canonical Correlation Analysis Of Data Dimensionality Reduction

Posted on:2012-10-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H ChenFull Text:PDF
GTID:1118330362966662Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of the data collection and storage, high-dimensional data such asspaceflight remote sensing, biology data, network data and money market data, etc, emerged recently.How to express these high-dimensional data in the low-dimensional space and discover the intrinsicstructure is an important topic of machine learning and pattern recognition. Over the past decades, alarge family of methods has been designed to provide different solutions to the problem ofdimensionality reduction. They are widely used in image analysis and processing, multimediaprocessing, medical data analysis, climate forecasting, computer vision and cross lingual textclassification, etc.. In this thesis, we focus on high-dimensional data dimensionality reduction. Weanalyze the advantage and disadvantage of the existing correlation-based dimensionality reductionalgorithms, including CCA and its variants. Based on generalized correlation analysis, we design aseries of efficient dimensionality reduction methods for single-view and multi-view data respectively.The major contributions of this thesis are as follows:(1) We design a novel supervised dimensionality method for supervised single-view data.Utilizing CCA to reduce the dimension of the supervised single view data, we usually maintain theoriginal data as the first view data, and select the corresponding class label encoding as the secondview data. It has been proved that when the labels are one-of-C or one-of-C-1encodings and all theinstances in each class share a common class label encoding,performing CCA on the workedtwo-view data is equivalent to LDA on the single-view supervised data. Based on the analysis of thereasons of equivalence, we propose two dimensionality reduction methods induced by classifierdesign. They realize the dimensionality reduction of supervised single-view data based on correlationanalysis. On the one hand, they are not equivalent to LDA due to the new definition of objectivefunction. On the other hand, they have lower complexity both in training and testing stage. And theexperimental result on artificial dataset and real world dataset validate their efficiency compared withother related works.(2) Integrating the idea of large margin learning, a novel supervised dimensionality reductionmethod is developed for supervised single-view data. The method aims to maximize the minimalcorrelation of each projected instance and its class label, thus named as Large Correlation Analysis(LCA). Unlike most existing correlation analysis methods which all maximize the total or ensemblecorrelation over all training instances, LCA devotes to maximizing the individual correlations between instance and its associated label. The objective function of LCA is converted into a relaxed quadraticprogramming with box-constraints, which can be effectively solved by Projected Barzilai-BorweinMethod (PBB). Experimental results on real-world datasets from both UCI and USPS show itseffectiveness compared to the existing related dimensionality reduction methods.(3) We design a novel supervised dimensionality method for supervised and paired multi-viewdata. Inspired by the success of supervised manifold learning and correlation analysis, we devisesupervised locality preserving canonical correlation analysis (SLPCCA). The algorithm utilizesdiscriminative structural information to construct the class-information matrix, as well as combinesthe correlation of the neighboring samples to construct the similarity matrix. As a result, SLPCCA cannot only improve the ability of CCA to solve nonlinear problems by infusing the local structuralinformation and breaking the linear restriction, but also overcome the shortcoming of LPCCA whichneglects the class information. We obtain features which is more favor of classification compared withLPCCA. The experimental results on MFD and USPS Dataset have demonstrated the superiority ofour proposed SLPCCA compared with CCA and LPCCA.(4) We propose a novel supervised dimensionality method for semi-supervised and semi-pairedmulti-view data. Facing semi-paired and semi-supervised multi-view data which widely exist inreal-world applications, CCA usually performs poorly due to its requirement of data pairing betweendifferent views and un-supervision in nature. Recently, several extensions of CCA have been proposed.However, they just handle the semi-paired scenario or just deal with semi-supervised scenario. Nowwe present a general dimensionality reduction framework for semi-paired and semi-supervisedmulti-view data which naturally generalizes existing related works by using different kinds of priorinformation. Based on the framework, we develop a novel dimensionality reduction method, termedas semi-paired and semi-supervised generalized correlation analysis (S2GCA). S2GCA exploits asmall amount of paired data to perform CCA and at the same time, utilizes both the global structuralinformation captured from the unlabeled data and the local discriminative information from thelimited labeled data to compensate the limited pairedness. Consequently, S2GCA can find thedirections which ensure not only maximal correlation between the paired data but also maximalseparability of the labeled data and the global structure of each view data. Experimental results onartificial data and four real-world datasets show its effectiveness compared to the existing relateddimensionality reduction methods.
Keywords/Search Tags:Dimensionality Reduction, Curse of Dimensionality, Multi-view Data, Single-view Data, Semi-supervised Learning, Manifold Learning, Canonical correlation Analysis, Principal ComponentAnalysis, Linear Discriminant Analysis, Discriminant Information
PDF Full Text Request
Related items