Font Size: a A A

Research And Application On Unsupervised Feature Reduction

Posted on:2009-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2178360272485710Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In many areas of machine learning, pattern recognition, information retrieval and bioinformatics, one is often confronted with the massive high-dimensional dataset, which leads to the curse of dimensionality. The computational complexity of learning machines in high-dimensional feature space is very expensive. In addition, noisy features will reduce the performance of learning algorithms. To solve these problems, feature reduction maps the original feature space into a low dimensional space, in which the important information for the following learning tasks is preserved. Feature reduction can be broadly divided into two categories: feature extraction and feature selection. Feature extraction tries to obtain a linear or nonlinear combination of original feature set and decorrelate the dependency among features. Feature selection tries to identify and depress the features that are not discriminative to the real classes. In the unsupervised background, due to the absence of class information, feature reduction, especially for feature selection, is a real challenge.Manifold learning is an important branch of feature extraction. In this dissertation, we propose a novel manifold learning method, called Locally Linear Inlaying (LLI). The basic assumption of manifold learning algorithms is that the input data lie on or close to a low-dimensional nonlinear manifold. By adopting divide-and-conquer strategy, LLI first embed various local linear areas and then inlay them globally. LLI greatly improve the time complexity and robustness of manifold learning algorithms. Firstly, its time complexity is linear in the number of data points; Secondly, LLI overcomes problems caused by the non-uniform sample distribution.Thirdly, LLI is robust to both homogeneous and heterogeneous noise. We demonstrate the efficiency and effectiveness of LLI by synthetic and real-world face datasets. As for feature selection, in the original feature set, there are a large number of noisy features, which will seriously disturb the reasonable distance metric (or relevant metric). Most existing feature selection methods lack metric invariance and hence are susceptible to strongly irrelevant distance metrics. In this dissertation, we propose a metric invariant approach to dealing with irrelevant distance metrics. The important observation is that, if a statistic guiding unsupervised feature selection is invariable under possible metric scaling, the solution of the feature selection model will be invariable; hence, if this model can work on a relevant feature space, it will still work on any irrelevant feature space transformed from the relevant one by metric scaling. Theoretical justification of the invariance of our model is demonstrated. Experiments on synthetic and real-world text datasets are also promising.
Keywords/Search Tags:Unsupervised Learning, Feature Reduction, Feature Extraction, Feature Selection, Manifold Learning, Locally Linear Inlaying, Metric Invariance
PDF Full Text Request
Related items