The Study Of SVM-RFE Based On Combinatorial Variables And Overlapping Area

Posted on:2015-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:X F Ding

Full Text:PDF

GTID:2298330467985637

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of science and technology, humanity must deal with a large number of information, which is rather complicated, every day. Data mining technology is that mining the potential useful information from these intricate materials, makes analysis and interpretation of data more concise and easier. Feature selection algorithm is an important branch of data mining, and it can remove noise and redundancy data from a large number of features, extract valuable features to reduce the feature dimension and simplify model. Metabolomics can process data with data mining technology, research metabolites in biological, analysis the relationship between metabolic substances and biological changes in physiological disease and excavate potential metabolic markers that are rich in information. And then it plays an important supporting role in the diagnosis of disease and clinical application.In the study of metabolomics, the interaction between metabolites, that needs at least two metabolic substances working together, may be the key factor that represents cancer. So, compared with a single variable, the combinatorial variables also may provide useful potential markers for cancer. However, different methods to construct the combinatorial variables have their own characteristics and have the different effect on feature selection and sample classification. In this thesis, the addition, subtraction, multiplication and logarithm-division combination methods are used to construct the combinatorial variables. This thesis chooses the best combinatorial variable from the above four combinatorial forms for a pair of variables, which are used to replace the original single expression data. And the combinatorial variables are used as the input data into the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for feature selection and sample classification (SVM-RFE-C). The experimental results of a data set handled by liquid chromatography mass spectrometry, show that the effectiveness of the combinatorial variables, so as to the combinatorial variables can obtain the information that has significant differences in feature selection.SVM-RFE is a kind of recursive feature elimination algorithm based on SVM and a sequential backward feature selection method. The algorithm using support vector to calculate the weight of each feature and iteratively deleting the worst feature in the current collection, so as to the feature space can be optimized. Feature weights in SVM measure the importance of each feature in sample classification, while the Overlapping Area (OA) of feature measure the characteristic and the connection between the class labels and the features, so it is an important measurement criteria for the features in the distribution of the samples. In order to better screening distinguishing features and establish a more effective classification model, this thesis combined OA with SVM-RFE for evaluating each feature in the current space, put forward a feature selection method OA-SVM-RFE. The test results of the five public datasets and an ovarian dataset show that the performance of OA-SVM-RFE algorithm is better than that of the original SVM-RFE algorithm.

Keywords/Search Tags:

Metabolomics, SVM-RFE, Feature Selection, Combinatorial Variables, Overlapping Area

PDF Full Text Request

Related items

1	The Feature Selection Algorithms Based On Category Overlapping Ratio And Feature's Overlapping Area
2	The Research And Application Of Feature Selection Algorithms In Mass Spectrometry Based Metabolomics Data
3	Study On Improved Feature Selection Algorithm Based On Effective Range
4	Memetic Algorithm Based Feature Weiehting For High-dimensional Metabolomics Data
5	The Research Of Feature Selection Techniques Based On Category Overlap Areas And Feature’s Effective Range
6	Random Forest Feature Selection
7	Analysis Of Feature Selection Algorithm Based On Support Vector Machine
8	The Design And Implementation Of Portable Overlapping Leaf Area Measurement Instrument
9	Web-based Metabolomics Database Software Design And Implementation
10	Research On Optimization Methods For Feature Selection