Font Size: a A A

Research On Extreme Learning Machine For Imbalanced Data Classification

Posted on:2019-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F TangFull Text:PDF
GTID:1368330545953339Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The class-imbalance learning is one of the most significant research topics in the data mining and pattern recognition.Imbalance problem occur where one of the two classes having more sample than other classes.Class-imbalanced problems are widely diverse fields.The presented approaches by several scholars,aim to deal with imbalanced data classification,are often designed to optimize the overall accuracy rate,causing bias towards the majority classes and results in a lower sensitivity to detect the minority class.The problem of how to improve the classification accuracy for minority classes while maintaining the overall classification performance needs to be solved.Extreme learning machine(ELM)owing to its characteristics such as extremely fast learning capability,flexibility,powerful performance for pattern recognition,and generalization ability highlighting the significance of this method for solving classification problems.However,ELM for imbalanced learning is a new research topic and it contains several limitations,such as require more hidden layer nodes than conventional tuning-based algorithms,and stable performance since the hidden node parameters were randomly chosen,as well as poor generalization performance because of training sample weights generated by sample distribution.In addition,it should be mentioned that generalization performance of ELM is affected by sample noise.Previous studies have reported that the some difficulties are available in traditional local optimization method in order to efficiently solve such problems.Intelligent optimization algorithm an appropriate method to solve the current optimization problems,and it is widely used as a global searching method in optimizing the parameters of neural network.Intelligent optimization algorithm is inspired by some mechanisms associated with natural characteristics and behaviors.It is such an efficient method for optimization problems.Ensemble learning is used as a method in reducing influence by hidden node parameters,which randomly chosen for a stable performance of neural network.In this study,the authors attempted to describe how to improve ELM to achieve appropriate generalization performance.Three methods are proposed to be used for WELM technique for imbalance learning as well as a model to improve the prediction of pupylation sites.Eventually,the following results were achieved:A self-adaptive differential evolutionary weighted extreme learning machine(SDE-WELM) algorithm is proposed,utilizing a self-adaptive differential evolutionary algorithm to find optimal weights,hidden node parameters,as well as training sample weights of the WELM.In addition,an appropriate criterion is adopted as a fitness function for binary imbalance learning.This study concentrates on decreasing the norm of the output weights of the single-hidden layer feed-forward network,maximizing the fitness function of area under the curve(AUC),and constraining the input weights and hidden biases within a reasonable range in order to enhance stability,the convergence and classification performance of minority class of the WELM.In this study,an improved artificial bee colony(IABC)algorithm was combined with WELM method to optimize input weights,hidden bias of network and the weight assigned to training samples.The exploitation ability and convergence rate are improved by combining DE operator with the ABC algorithm,in which the output weights are calculated using WELM.The improved artificial bee colony algorithm focuses on restricting the input weights and hidden biases within a reasonable range and assigning optimal weight to training data in order to achieve proper generalization performance.An ensemble WELM algorithm proposed,in which use error measure considering distribution of different classes and calculate the sigmoid function of the error measure to diminish the effect of noise from datasets.The proposed ensemble WELM could achieve more generalization performance than other methods,since it relies on respective error of two class samples to weigh the classifiers,balancing the classification performance of two classes and considers the effect of noise on the datasets.A model based on WELM is proposed to improve the prediction of pupylation sites.The proposed model is based on the pseudo amino acid composition and is trained with the ensemble WELM algorithm.The proposed model iteratively trains a series of ensemble WELM using both annotated and non-annotated pupylated proteins.The proposed model is based on natural distribution for identification of pupylation sites in prokaryotic proteins,thus the predicted accuracy is satisfactory.Therefore,this model can be utilized for accurate prediction of pupylation sites.
Keywords/Search Tags:extreme learning machine, class-imbalance learning, performance metric, ensemble learning, the prediction of pupylation sites
PDF Full Text Request
Related items