Font Size: a A A

The Research Of Feature Selection Algorithm Based On SVM-RFE

Posted on:2016-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2308330461476455Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the advancement of science technology and the application of advanced device, the huge data has been generated everyday. High dimension in feature and small size in sample, is the characteristic of biologic data, and brings the new challenge to mankind for processing information. In order to extract the valuable information from the big data, the data mining technology is applied, which is a general definition and is composed with statistics learning, machine learning, pattern recognition and so on. The feature selection is one of the data mining technology, and has been applied widely in processing information in may realm.The feature selection technology aims at eliminating noise, irrelevant, redundancy and non-discriminated feature, achieving the target, to eliminate the false and to retain the true. It may lose some information of features through that process, but it could select the features which stand for the truth of the phenomenon. SVM-RFE has good preference and strong generalization ability, which is recursive feature elimination based on support vector machine. In order to increase the preference of combined RFE process. The paper uses the simulated annealing and Pearson’s correlation coefficient as judgment standard line, reevaluate the deleted feature subset and the retain feature subset currently, try to find the irrelevant features and give them a chance to be measured again. The earlier the feature is deleted, the more chance the feature is reevaluated. When the accuracy of current subset and the current best subset remain equal during the process of searching best feature subset, the mutual information is used to reevaluate the relationship between feature subset and classes, select the most correlation feature subset as the real best.Along with the development of analytical techniques and the increase of genes proteins and other biological data dimensionality, there are noise features, irrelevant features, and interrelated features, which express the complex biological phenomena together. During the process of high dimension biologic samples, it is important to eliminate the noisy and redundancy features, and to retain the discriminated and interrelated features. It helps to extinct the noise, and reflect the essence of the problem. The overlapping technology can exclude the noise and irrelevant feature, and evaluate the relationship between the features and classes. The relationship score of features, which evaluate the interrelated features. The paper evaluate the features by comprehensive score, which is composed of the overlap of features, the relationship score of features, and the SVM weight of features on hyper plane. It helps determine the different types of diseases, drug efficacy and other biomarker information.The results showed that the correction of subset of features during RFE removal process early, and multi-aspect comprehensive evaluation score of feature method, can improve performance of feature selection significantly.
Keywords/Search Tags:SVM-RFE, Simulated Annealing, Correlation, TSP, Overlapping
PDF Full Text Request
Related items