Font Size: a A A

Theoretical Analysis Of Feature Selection Based On Local Structure And Its Application

Posted on:2016-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:X P ChengFull Text:PDF
GTID:2308330479993932Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of modern technology, large-scale data, such as high-throughput data, is expanding in various fields, including bioinformatics, medical image processing, information retrieval and data mining. Since these new high-throughput data possess small sample size with high-dimensional features, its analysis is largely limited by the phenomenon of “curse of dimensionality”. Preprocessing of such high-throughput data by using efficient feature selection methods to remove the irrelevant and redundant features remains a challenging topic. This thesis aims to provide efficient and accurate models for feature selection. The main contributions are in two aspects:1. A feature selection method, called local hyperplane-based discriminant analysis(LHDA) is provided in this thesis. LHDA adopts two central ideas. 1) it uses a local structure approximation rather than by global ones; 2) it embeds a recently reported classification model, K-Local Hyperplane Distance Nearest Neighbor(HKNN) classifier, into its discriminator to obtain the optimal feature subset by minimizing the LOOCV training error function. The performance of the proposed method is evaluated in extensive experiments on synthetic and real microarray benchmark datasets. Eight classical feature selection methods, four classification models and two popular embedded learning schemes, are employed for comparisons. Experimental results show that the proposed method yielded comparable to or superior performances to several state-of-the-art models respect to accuracy, size of feature subsets and ability of anti-noise. Especially for high-throughput data, LHDA algorithm show nice stability and immunity to noise degradation.2. This thesis also addressed an ensemble feature selection model, with base learner being random selection, casted for high-throughput data. Experimental results show that the accuracy of random subspace based ensemble feature selection algorithm is significantly higher than others methods. The proposed method is promising in usage in high-throughput data analysis.
Keywords/Search Tags:Feature weighting, Local hyperplane, Local learning, Feature selection
PDF Full Text Request
Related items