Font Size: a A A

Maximize AUC With Outlier Detection For Positive-unlabeled Classification And Incremental Algorithm

Posted on:2022-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y M MaFull Text:PDF
GTID:2518306527978109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The positive-unlabeled(PU)classification is referred to as PU classification.Because there are only positive examples and unlabeled samples,traditional classification methods often have poor results in PU classification.It is very challenging to build a robust classification for the PU problem,especially for complex data with overwhelming negative samples and containing outliers(mislabeled samples).This classification situation is very common in practical applications,such as healthcare,text classification and bioinformatics.Especially with the rapid development of data today,how to quickly and accurately classify and realize online learning has become a hot topic in current research.This article focuses on the incremental PU classification that contains outliers as follows:This paper applies the squared loss function of maximizing AUC to PU classification and realizes incremental learning,which is called the incremental kernel maximization AUC algorithm(IKMAUC).The algorithm uses the Gaussian kernel function to map the linearly inseparable data in the low-dimensional space to the high-dimensional space to make it linearly separable.By optimizing the AUC objective function,the analytical solution is obtained to avoid the trouble of multiple iterations,and the block matrix inversion formula is used to solve the problem.Add the distribution of the new sample in the high-dimensional space,and then use the Sherman-Morrison-Woodbury formula to calculate the weight of the model,which speeds up the calculation.By comparing the proposed algorithm with the ideal support vector machine(SVM)algorithm whose labels of all positive and negative examples in the training set are known,similar performance is achieved,and compared with the other four PU classification algorithms,it achieves better The performance of KMAUC algorithm without increment is compared with the IKMAUC algorithm with incremental algorithm.The experimental results show that the IKMAUC algorithm with incremental learning greatly reduces the computing time and is a powerful tool for dealing with real problems.But for data sets containing outliers(wrongly labeled samples),the IKMAUC algorithm has the problem of unsatisfactory classification accuracy.To this end,this paper further proposes an incremental kernel maximization AUC algorithm(IKMAUC-OD)including outlier detection.Aiming at the outlier problem in PU classification,this algorithm proposes an AUC square loss function including outlier detection,and uses gradient descent and least squares to optimize the objective function.The experimental results show that the proposed algorithm still achieves performance similar to the ideal SVM in the presence of outliers,and can achieve rapid increments.It has achieved significant results on artificial data sets and UCI data sets,and is suitable for dealing with real problems.Powerful tool.
Keywords/Search Tags:machine learning, positive-unlabeled classification, outlier detection, AUC, incremental algorithm
PDF Full Text Request
Related items