Font Size: a A A

Research On Imbalanced Binary Classification Algorithm Based On Evolutionary Multi-Objective Optimization

Posted on:2020-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:G L FuFull Text:PDF
GTID:2428330575471329Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important research field of machine learning and data mining,classification has been widely concerned by researchers.The binary classification is the most widely studied in the classification.Among the binary classification,many of the existing algorithms focus on the balanced binary classification.However,in practical applications,the collected data usually has an imbalance between the two categories,which is called the imbalanced binary classification.Many scholars have studied the imbalanced binary classification and proposed common metrics such as AUC.The emergence of these metrics has contributed to the development of the imbalanced binary classification,however,with the promotion of the imbalanced binary classification,due to the particularity of the actual problem,new challenges are also ushered in.Such as pedestrian detection,the user is interested in the partial area under the ROC curve(pAUC),instead of the full AUC.Traditional methods use AUC as the optimization objective,these methods are difficult to directly optimize pAUC.In addition,when training a gene name recognizer in the biological field,then only positive examples(annotated gene)are available,as well as a large set of unlabeled data which we can get,which is called Positive Unlabeled learning.In recent years,researchers have proposed many algorithms for pAUC optimization and PU learning.However,most existing works use traditional optimization techniques,the objective needs to satisfy certain assumptions,such as convex or continuous.As a new optimization method,evolutionary algorithms(EA)has a good parallelism,strong global search ability and does not require the assumptions of a given function,and successfully solves many complex optimization problems.Based on this,this thesis proposes a pAUC optimization algorithm based on evolutionary multi-objective optimization and a PU learning algorithm based on evolutionary multi-objective optimization from the perspective of evolutionary algorithms.The main work and results of this thesis are summarized as follows:(1)This thesis proposes a pAUC optimization algorithm(MOPA)based on evolutionary multi-objective optimization.Two important components were developed that guarantee that the proposed algorithm focuses on partial AUC between any two given false positive rates.First,a new metric(K-FPR)is proposed by focusing only on the negative samples of the first TOP-K,that is created by considering the partial range of the false positive rate(FPR).In addition,a preference based reference point set strategy is designed in the framework of AR-MOEA.By using this preference based strategy,the search process of MOPA focuses on the partial AUC that users prefer.To be specific,first,the original reference point set of AR-MOEA is initialized,that is,the weight vector is mapped to the vicinity of the preference point according to a preset preference point;the process uses a reference point based on the preference to update the archive A and reference point set R.Experimental results for different data sets demonstrated the superiority of MOPA.(2)This thesis proposes a PU learning algorithm based on evolutionary multi-objective optimization(MOP-PUL).In order to solve the problem of only a small number of positive samples and a large number of unlabeled samples in training,this thesis proposes to combine evolutionary multi-objective optimization with PU learning,which uses 0-1 encoding to transform PU learning into solving sparse solution problems,and proposes an initialization strategy based on European distance and an individual update strategy based on adaptive mutation probability.To be specific,the algorithm calculates the average Euclidean distance between the samples in U and the samples in P when initializing,takes the distance value as the fitness value of the samples in U,and proposes an initialization strategy based on Euclidean distance.In the process of evolution,the mutation probability of each individual is dynamically adjusted by the statistics of historical information.An individual update strategy based on adaptive mutation probability is proposed,and a new objective is used to evaluate the individual.Compared with state-of-the-art algorithms on the benchmark dataset,numerical experiments on different data sets demonstrate the competitiveness of the proposed method.
Keywords/Search Tags:Imbalanced Binary Classification, Evolutionary Algorithms, Multi-Objective Optimization, AUC, pAUC, Users' preference
PDF Full Text Request
Related items