Font Size: a A A

Research And Application Of Partial Least Squares Based Dimension Reduction

Posted on:2010-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q CengFull Text:PDF
GTID:1118360278976356Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid increase of the dimensionality of data, dimension reduction has already become a very important data process technique. By the help of dimension reduction, we can promote the generalization performance of learner, reduce the computation cost of modeling and increase the interpretability of data. Among various dimension reduction algorithms, PLSDR (Partial Least Squares based Dimension Reduction) is one of the most effective methods.By investigating the theory of PLSDR method, this paper proposed a novel Partial Least Squares based Dimension Reduction Framework. Furthermore, lots of algorithms have been designed under the framework. In detail, this paper has following contributions.1) This paper proposed a novel Partial Least Squares based Dimension Reduction Framework. The elimination of irrelevant or redundant features and the selection of latent components are two important issues for PLSDR, which have often been overlooked by previous works. Aiming at these problems, this paper proposed a Partial Least Squares based Dimension Reduction Framework, which integrates the feature pre-selection and model selection with the PLSDR method.2) Under the PLSDR framework, this paper proposed several feature pre-selection algorithms. Feature pre-selection has greatly impact on the performance of PLSDR, whose goal is eliminating irrelevant and redundant features beforehand. About the elimination of irrelevant features, this paper proposed the PLSDR-G algorithm, which finds the irrelevant features by the probe variable and its t-statistic score. About the reduction of redundant features, this paper designed a novel metric which estimates the feature redundancy based on the discriminative contribution. Furthermore, based on the metric, a novel REDISC (Redundancy elimination based on discriminative contribution) method has also been proposed. 3) This paper has also proposed several model selection algorithms under the PLSDR framework. After the extraction of latent components, model selection algorithm determines the reduced space by selecting some component from them. Firstly, this paper proposed two model selection methods by the goodness of fit of regression R y2: PAS (PLSDR with model selection by using Absolute R y2 Scores) algorithm and PIS (PLSDR with model selection by using Incremental R y2 Scores) algorithm. Secondly, this paper proposed the FSBFE (Feature Selection Based Feature Extraction) algorithm, which embedded the learners into the model of PLSDR with the genetic algorithm.4) This paper applied the PLSDR method into the field of text classification. Extracting the latent concepts from text is an effective way to handle the problem of synonymous and polysemous. However, the existing LSI (Latent Semantic Indexing) method has relative poor performance for text classification because the lack consideration of label information. Aiming at this problem, this paper proposed the SIPLS (Semantic Indexing based on Partial Least Squares) method and the LSIPLS (Local Semantic Indexing based on Partial Least Squares) method. Both methods have exhibited better performance in the experiments.
Keywords/Search Tags:Dimension Reduction, Feature Extraction, Partial Least Squares based Dimension Reduction
PDF Full Text Request
Related items