Font Size: a A A

The Research Of Denoising Algorithm A Unbalanced Issues Based On SVM-RFE

Posted on:2014-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:F F YangFull Text:PDF
GTID:2230330395499065Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Metabolomics quantitatively analyses the metabolites in organism, and studies the relationship between metabolites and physiological or pathological changes. Metabolomics data contains a large amount of noise and irrelevant features. Applying Data Mining technology to metabolomics data is helpful to reduce the complexity of the data, excavate the potential metabolic markers, and then it plays an important supporting role in the diagnosis of disease and clinical application.SVM-RFE is a kind of recursive feature elimination algorithm based on SVM, using support vectors to calculate the weight of each feature. Noise features in metabolomics data are likely to impact the construction of the optimal hyper-plane in SVM. thus can affect the evaluation of features. Therefore, this paper proposed a MI-SVM-RFE feature selection method based on mutual information and artificial variables, which uses artificial variables-mutual information method to filter noises before SVM-RFE feature selection, and then feature weight calculation can be more accurate and the optimal feature subset can be selected. MI-SVM-RFE algorithm is applied to liver disease metabolomics data and34significantly distinguishing metabolites are selected. An accuracy of74.33±2.98%to distinguish among three liver diseases is obtained, better than72.00±4.15%from the original SVM-RFE.In SVM classifier, sample imbalance may lead to the unbalance of support vectors quantity in every sample group or unbalanced distribution of support vectors, so as to make the classification effect of minority class poor. Therefore, this paper introduces EFSBS thoughts into SVM-RFE, and proposes EFSBS-SVM-RFE algorithm, which contributes to the analysis and understanding of unbalanced data, and better extracts biomarkers. EFSBS-SVM-RFE is applied to chemical components of flue cured tobacco leaves from different varieties,15chemical components are selected, which are rich in distinguishing information, and can better distinguish between two varieties of tobacco leaves.As a sequential backward feature selection method, SVM-RFE optimizes the feature space based on multi-variant classifier. FFS-ACSA is a forward feature selection method, of which the classifier is based on single variable. It takes into account the complementation between features. In order to better select distinguishing features and establish more effective classification models, this paper combines FFS-ACSA with SVM-RFE, proposes a combined feature selection method Forward-RFE. making full use of the forward feature selection method FFS-ACSA and the backward feature selection method SVM-RFE. The performance of the proposed algorithm is verified on four groups of public data sets.
Keywords/Search Tags:Metabolomics/Metabonomics, SVM-RFE, Artificial Contrast Varlables, Denosing, Unbalance
PDF Full Text Request
Related items