Font Size: a A A

The Study Of SVM-RFE Based On Combinatorial Variables And Overlapping Area

Posted on:2015-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:X F DingFull Text:PDF
GTID:2298330467985637Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology, humanity must deal with a large number of information, which is rather complicated, every day. Data mining technology is that mining the potential useful information from these intricate materials, makes analysis and interpretation of data more concise and easier. Feature selection algorithm is an important branch of data mining, and it can remove noise and redundancy data from a large number of features, extract valuable features to reduce the feature dimension and simplify model. Metabolomics can process data with data mining technology, research metabolites in biological, analysis the relationship between metabolic substances and biological changes in physiological disease and excavate potential metabolic markers that are rich in information. And then it plays an important supporting role in the diagnosis of disease and clinical application.In the study of metabolomics, the interaction between metabolites, that needs at least two metabolic substances working together, may be the key factor that represents cancer. So, compared with a single variable, the combinatorial variables also may provide useful potential markers for cancer. However, different methods to construct the combinatorial variables have their own characteristics and have the different effect on feature selection and sample classification. In this thesis, the addition, subtraction, multiplication and logarithm-division combination methods are used to construct the combinatorial variables. This thesis chooses the best combinatorial variable from the above four combinatorial forms for a pair of variables, which are used to replace the original single expression data. And the combinatorial variables are used as the input data into the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for feature selection and sample classification (SVM-RFE-C). The experimental results of a data set handled by liquid chromatography mass spectrometry, show that the effectiveness of the combinatorial variables, so as to the combinatorial variables can obtain the information that has significant differences in feature selection.SVM-RFE is a kind of recursive feature elimination algorithm based on SVM and a sequential backward feature selection method. The algorithm using support vector to calculate the weight of each feature and iteratively deleting the worst feature in the current collection, so as to the feature space can be optimized. Feature weights in SVM measure the importance of each feature in sample classification, while the Overlapping Area (OA) of feature measure the characteristic and the connection between the class labels and the features, so it is an important measurement criteria for the features in the distribution of the samples. In order to better screening distinguishing features and establish a more effective classification model, this thesis combined OA with SVM-RFE for evaluating each feature in the current space, put forward a feature selection method OA-SVM-RFE. The test results of the five public datasets and an ovarian dataset show that the performance of OA-SVM-RFE algorithm is better than that of the original SVM-RFE algorithm.
Keywords/Search Tags:Metabolomics, SVM-RFE, Feature Selection, Combinatorial Variables, Overlapping Area
PDF Full Text Request
Related items