Font Size: a A A

The Research Of Ensemble Classifier And SVM-RFE Feature Selection Algorithm

Posted on:2015-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:H WeiFull Text:PDF
GTID:2298330467986694Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the development of the science and technology, large amounts of data have been generated by many areas, and mankind enters the era of big data. The generation of large amounts of data is opportunities and challenges for us. Then it is urgent for us to found the useful information from the data to facilitate the development of productivity. The data mining is developed in this context. Data mining is an interdisciplinary, which combines pattern recognition, machine learning, statistical learning, artificial intelligence technology. By analyzing the useful information of the data, learning the essence, to its dregs, the data mining is to find the potential knowledge. At present, the data mining technology is widely applied to genomics, proteomics, metabolomics and so on.Classification and feature selection techniques are commonly used basic data mining technology, and they play an important role in knowledge discovery and information extraction. Since different classification methods could capture the different discriminative information of the data, using different classification techniques to build the basic ones could increase their diversity and strength by making full use of the complementarity among the classification methods. This paper proposes a ensemble classifier based on multiple diverse classification methods. The ensemble classifier consists of some ensemble basic classifiers, which is weighted fusion of three basic classifiers:decision tree, SVM andN. Experimental results on public datasets show that, compared with other ensemble techniques and single classifiers, the proposed method improves the accuracy of classification in most case.High-dimensional data lead to take more time to build model but get lower performance. The purpose of feature selection technique is to select the useful features from the high-dimensional data and improve the performance of the model. SVM-RFE is a common feature selection techniques, which is recursive feature elimination based on support vector machine, and can effectively remove noise features and redundant features. The distribution of the sample can affect the hyperplane in SVM model, thus affecting the result of feature selection. This paper calculates the overlap degree of each sample by class overlap technique and select the sample that below a preset threshold to build SVM. In the experiment, this paper presents the classification accuracy and analysis of the feature. The experimental results show that the proposed method improves the SVM-RFE to a certain extent. This paper first proposes an ensemble classifier based on multiple diverse classifiers which improves the classification accuracy by using the complementarity among the different classifiers. Secondly, the paper proposes an improved SVM-RFE by means of class overlap to calculate the distribution of sample.
Keywords/Search Tags:Data Mining, Ensemble Classifier, Class Overlap, SVM-RFE
PDF Full Text Request
Related items