Research On Feature Selection And Its Stability For High-dimensional Data

Posted on:2013-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:J Bao

Full Text:PDF

GTID:2248330395952890

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Feature selection is one of the key problems in the field of machine learning and pattern recognition. With machine learning and pattern recognition becoming increasingly deeper, the object of learning becoming more complex, the feature dimension of the object is becoming higher and higher. High-dimensional data is a data set which contains hundreds or thousands of features, and it contains a lot of irrelevant and redundant information which may greatly reduce the performance of learning algorithms. Therefore, feature selection is particularly important when faced with high-dimensional data. In order to solve the problem of feature selection on high-dimensional data, domestic and foreign researchers have done a lot of work, which has been widely applied to the field of text classification, risk management, Web classification, medical diagnosis, biological data analysis, genetic genome projects and so on.The existing studies are often too concerned about the classification performance of feature selection to neglect of its stability which refers to the insensitivity of the classification to the small changes of training samples. The stability of feature selection is particularly important especially in the process of discovery of the real variable of a natural model, which has been widely applied to biomarkers. So far, researchers have proposed ensemble feature selection method, feature selection with prior feature relevance, group feature selection, feature selection with sample injection and so on in order to improve the stability of feature selection. In this paper, we study the feature selection methods and its stability on high-dimensional data, and focus on’how to get the feature subset’,’how to evaluate a feature subset’ and how to make the results of feature selection stable’. We draw on the existing research results and improved the original feature selection and criteria of feature subset by using1-Norm SVM, ensemble learning. The main work is as follows:1. Propose an algorithm named SFS-tSVM based on the improved SVM-based evaluation criteria. We improve the existing SVM-based evaluation criteria by adding a threshold, and combine it with one feature selection method. Then we integrate the results of feature selection by data perturbation. Moreover, we design a new stability measure for high-dimensional data. The experiments show that SFS-tSVM can effectively improve the stability of the method.2. Propose an ensemble algorithm called MFS-tSVM based on multiple feature selector and the improved SVM-based evaluation criteria. We integrate the results of the multiple feature selectors by function perturbation. The experiments show that this algorithm can effectively improve the stability of feature selection, and has good classification accuracy. 3. Propose an ensemble algorithm named L1SVM-EFS based on L1SVM. The sparse SVM is applied to the feature selection for high-dimensional data. We combine L1SVM and ensemble method based on data perturbation. The experiments show that this algorithm can effectively improve the stability of feature selection results and does not decrease the classification accuracy.

Keywords/Search Tags:

high-dimensional data, feature selection, stability, ensemble learning, L1SVM

PDF Full Text Request

Related items

1	Research On High-dimensional Unbalanced Data Classification Algorithm Based On Feature Selection And Ensemble Learning
2	Study On Feature Selection And Ensemble Learning Based On Feature Selection For High-Dimensional Datasets
3	Research Of Ensemble Learning For High-dimensional And Imbalanced Data Classification
4	Classification And Feature Selection On High-dimensional And Small-sampling Data
5	Mdl-based Feature Selection For High Dimensional Data
6	Research On The Stability Of Feature Selection For High-dimensional Small Sample Data
7	High-dimensional Data Processing And Forecasting Based On Feature Learning
8	Multi-objective Cluster Ensemble Selection For High Dimensional Data
9	Research On Classifier-selection-based Ensemble Learning Algorithm
10	Research And Implementation Of Ensemble Feature Selection Framework Based On Mixed Disturbance