Font Size: a A A

Analysis Of Feature Selection Algorithm Based On Support Vector Machine

Posted on:2011-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:K YanFull Text:PDF
GTID:2178330332461457Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Metabolomics can be defined as the field of science that deals with the measurement of metabolites in an organism for the study of the physiological processes and their reactions to various stimuli such as infection, disease, or drug use. Metabolomics methods usually produce data with a small number of samples and a large number of variables, which contain noisy and redundant data. Hence selecting the data that best explain the phenotypic differences is very important in analyzing the metabolomics data and hopefully understanding the complex biological processes.Multivariate analysis and machine learning methods are necessary to analysis the metabolomics data. Support vector machine (SVM) are known to have excellent generalization abilities when compared to other statistical multivariate methods. Support vector machine recursive feature elimination (SVM-RFE) is one of the most efficient feature selection algorithms. Especially the stability of selection technique has been receiving more and more attention recently. In this paper, we study the feature selection technique. First, we use SVM and SVM-RFE with different strategy to the analysis of the rice sheath bight data. Compared with PLS-DA, its performance and ability to find fewer features is better than PLS-DA technique. The R2/Q2 values and their intercepts of the model shows that the selected feature subset not only explains the test data completely (more clearly), but also has a strong predictive power. SVM-RFE conducts the feature filtering in a backward sequence and deletes bottom ranked features with Filter-out-Factor (m>0) in each iteration. SVM-RFE is very sensitive to m and unstable. Thus, we proposed an improved SVM-RFE method based on the dynamic Filter-out-Factor (SVM-RFE-DFF). In each loop, only the features lying in a specific window and having no contribution to improving the classification performance are eliminated. Later, we applied the ensemble technique to SVM-RFE-DFF to improve its performance including the stability.Then the algorithm was applied to process a metabolic syndrome data set. Experiments showed that our method SVM-RFE-DFF outperforms SVM-RFE in discriminating metabolic syndrome patients and healthy controls, the influence of the size of the window on the classification rate is less than that of the Filter-out-Factor on SVM-RFE. And the ensemble technique can further improve the performance of SVM-RFE-DFF.
Keywords/Search Tags:Metabolomics, SVM, SVM-RFE, SVM-RFE-DFF, Ensemble Technique
PDF Full Text Request
Related items