Font Size: a A A

Research On Dimensionality Reduction Of High-Dimensional Data

Posted on:2013-03-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y R SuFull Text:PDF
GTID:1228330377951759Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In the era of "information explosion", we are often faced with the analysis and processing of various data, such as the mass of web data, large-scale database text, and a large number of remote sensing images, which present a challenge to the development of statistics, pattern recognition, artificial intelligence, data mining, machine learning and other related disciplines. These data show rapid growth with geometric progression and are usually with high dimensionality in reality. These high-dimensional data often led to the so-called "the curse of dimensionality". On the one hand, they are usually sparse and redundant, thus are easy to conceal the true structure of the data, and even can lead to erroneous results. On the other hand, they increase the burden of analyzing and processing data. As an effective way to solve "the curse of dimensionality", dimensionality reduction has become an important research topic. Dimensionality reduction can reduce the high-dimensional feature space into a low-dimensional feature space, which can better reflect the nature of the data structure and improve the efficiency of data analysis and processing. In this paper, we made a thorough research on the theoretics and applications of dimensionality reduction.1. A novel subspace learning method of dynamic optimization for feature extraction was proposed. This method is accomplished by searching for the optimal coefficient to balance the objective function of the principal component analysis (PCA) and maximum margin criterion (MMC). The PCA can consider more of the discriminant information while representing the original data structure by linear projection direction. Moreover, the MMC can better express the original data structure of the sample while searching the linear projection direction for the reduction of feature dimensionality and the extraction of classified information simultaneously. In addition, different data, or even those with the same kind under different conditions, may have different structural characteristics. Therefore, algorithm should be developed based on the structural characteristics of the data. Actually this method can meet this requirement. Finally, tumor classification experiments on gene microarray data verified that this new feature extraction method was effective and stable.2. An unsupervised feature selection algorithm based on sparse representation called sparse score (SS) was proposed. Although sparse representation belongs to global methods in nature, it owns some discriminating and local properties, which makes SS not only have strong discriminating ability, but also have the abilities of preserving local structure and certain global structure. In addition, SS selected features with relatively large variance, i.e., large information content. Experimental results of clustering on face images show that, in the evaluation of feature significance, SS significantly outperformed the other two kinds of feature selection algorithms, Variance Score (VS) and Laplacian Score (LS).3. A supervised feature extraction algorithm based on low-rank discriminant projection (LRDP) was proposed. Based on low-rank representation, LRDP has good abilities for representing global structure and certain discrimination structure. In addition, LRDP based on the decision rule of SRC, which makes it have good discriminating ability. Experimental results of classification on face images reveal that, the LRDP has better performance than some other feature extraction algorithms, including PCA, LDA and sparsity preserving projections (SPP).4. Based on ontology theory, a novel method of ontology-based feature optimization for agricultural text was proposed. First, the terms of vector space model were replaced by concepts of agricultural ontology, where the concept frequency weights were computed statistically by term frequency weights. Second, the concept similarity weights were assigned to the concept weights, through the concept hierarchy structure of agricultural ontology. By combining feature frequency weights and feature similarity weights based on agricultural ontology, the dimensionality of the feature space can be reduced dramatically. Moreover, the semantic information can be incorporated into the feature space. Finally, the agricultural text clustering experiments were carried out to verify the effectiveness of this method.
Keywords/Search Tags:dimensionality reduction, feature selection, feature extraction, sparserepresentation, low-rank representation, agricultural ontology
PDF Full Text Request
Related items