Font Size: a A A

Research On Direct Optimization Of AUC Algorithm Based On Online Learning

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiuFull Text:PDF
GTID:2428330629980111Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Classification learning is an important research field of machine learning and data mining.Among them,binary classification learning has attracted the attention of many scholars because of its wide application.Most traditional binary classification algorithms focus on a balanced environment,and in actual applications,the real data is inconsistent between the two categories.In this regard,many scholars have a strong interest in the problem of unbalanced binary classification,in which many algorithms are proposed to directly optimize the classification criteria.As a representative of them,the direct optimization AUC algorithm has become a research hotspot since it focuses on the partial order relationship between positive and negative samples,and has achieved good results.Most of the existing direct optimization AUC algorithms use batch learning,which makes it necessary to store a large number of samples and calculate the gradient information of all samples in one calculation,which reduces the effect and is not suitable for large-scale data scenarios.In this context,this thesis combines online learning and direct optimization AUC,and proposes to study online optimization AUC algorithm for large-scale data.Taking advantage of online learning in a large-scale environment,an online AUC optimization algorithm based on adaptive regularization is first proposed,and then a sparse online AUC optimization algorithm based on adaptive update is proposed for a large-scale high-dimensional environment.The main work is summarized as follows:(1)Since traditional online learning is only suitable for processing single sample data,it is not suitable for AUC-oriented sample pair problem.Therefore,this thesis proposed an online AUC optimization algorithm based on adaptive regularization.Specifically,by assuming that the model conforms to the multivariate Gaussian distribution,ie,w N(?,(50)).According to the difference between the empirical distribution and the probability distribution,combined with three attributes of large margin training,confidence weighting,and handle non-separable data,the AUC-oriented objective function is defined.After each time a new sample is received,adaptive regularization of the prediction function can effectively obtain the classification model.At the same time,the algorithm is associated with confidence-weighted online learning technology.Confidence(Inverse metric of eigenvalues of covariance matrix(50))increases as the sample iteratively updates,and confidence reflects the correlation between each dimension of data,using its correlation to adaptively update the learning rate strategy,which can effectively improve the overall performance of the algorithm.Theoretical analysis shows that the regret limit of the proposed algorithm is O(T),and then the effectiveness of the proposed algorithm is verified on a large-scale experimental data sets.(2)In view of many real data scales,not only the data size is large but also the data dimension is very high.Although the existing online optimization AUC algorithm has achieved good classification results,it has paid less attention to high-dimensional data.In this regard,this thesis proposed an adaptive update sparse online AUC optimization algorithm for highdimensional data.For large-scale high-dimensional data problems,first,the AUC maximization problem is transformed into a strong convex optimization problem based on L1 regular terms.By using COMID as an internal optimization algorithm,Bregman divergence is integrated as a model change method.At the same time,combined with Adagrad to make full use of the second-order information of the gradient,an adaptive step size suitable for different dimensions is obtained to effectively update the sparse classification model.In order to further improve the performance of the algorithm,a strategy based on polynomial decay is proposed.Theoretical analysis and large-scale high-dimensional data experiments show the effectiveness of the proposed algorithm.
Keywords/Search Tags:Imbalanced binary classification, AUC, Online learning, Sparse learning
PDF Full Text Request
Related items