Font Size: a A A

Feature Selection Based Ensemble Classification And Its Application

Posted on:2020-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:S C YuFull Text:PDF
GTID:2428330590951116Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The scale of data in the real world is continuing to expand nowadays due to the quick development and wide application of computer technology,rapid improvement of data collection methods,and the interpenetration of computer science and other fields such as medical science,economics,bioinformatics and so on.In the current process of using computer technology to process data,there may be difficulties such as: the amount of data far exceeds the memory capacity of the computing device,the characteristics of the data sample are numerous and lack of reasonable semantic interpretation,and the spatial distribution of data is sparse,irregular,etc.While traditional statistical methods such as the regression analysis and the Bayesian decision theory are work not well on today's high-dimensional data,how to dig out effective information from a large amount of complex data,and then rationally design a machine learning program is very practical problem.In order to make computers work better on the complex and sparse high-dimensional data,we can start from two aspects: One is to reduce the feature dimension of high-dimensional data,commonly used methods are feature selection and feature transform.The other is to fuse multiple learners to analyze data from different perspectives,that is ensemble learning.The major work of this paper is to explore the way of gathering feature selection and ensemble learning to deal with the classification problem on complex data.Specifically,in order to gather feature selection and ensemble learning,there are two solutions that are mentioned in this thesis:One is randomized feature selection(randomized reduction),we can use it to produce different feature subsets-classifier pair,and ensemble them.Since some deterministic classification models(such as Naive Bayesian model and nearest neighbor classification model,etc.)can't produce different training results from the same data training,that is to say,they can't meet the requirements of the ensemble learning model.Several different feature subsets can be created via randomized feature selection.Then pairs of the feature subset and the original base classifier are able to meet the requirement of differences of ensemble learning models.In other words,randomized feature selection provides a feasible scheme for ensemble classification models using certianty classifiers.According to the idea of randomized feature selection,an ensemble strategy for neighborhood classifier is proposed based on randomized reduction.Firstly,a random parameter is introduced into the heuristic process for computing multi-different reducts that reduce the decision error of neighborhood classifier.Secondly,the voting ensemble is employed for fusing the neighborhood classification results by these reducts.The experimental results on 12 UCI data sets tell us the proposed strategy can improve not only the classification accuracy but also the classification robustness.This study provides us a technique for studying rough set theory via ensemble learning strategy.The other is enhancing the accuracy of individual classifiers in an ensemble model by applying the feature selection technology,in order to enhance accuracy of the hole model.Because there may be redundant features in the original features of data that make the classification ability worse,eliminating redundant features by feature selection can achieve better classification performance on a specific classifier than using the original feature set.According to the idea above,we proposed a strategy of ensemble-based error minimization reduction for extreme learning machine.In the strategy,we improve the classification ability of the classical voting based extreme learning machine(ELM)ensemble model by selecting features that reducing the error on each individual ELM.The selection method is the well known Wrapper method,and the error on ELM include generalization error and empirical error.The experiments on 6 UCI data sets show the proposed method has better classification ability than the classical one in the same situations.In addition,for the purpose of applying the proposed solution to the actual problem,we applied it to a specific problem of protein secondary structure prediction and proposed a feasible prediction scheme.
Keywords/Search Tags:Feature Selection, Ensemble Learning, Neighborhood Classifier, Extreme Learning Machine, Protein Secondary Structure Prediction, Decision Error Rate
PDF Full Text Request
Related items