Font Size: a A A

Research On Direct Optimization Of PAUC Algorithm For Large-scale Data

Posted on:2019-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:D D SongFull Text:PDF
GTID:2348330545461766Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the past two decades,classification has attracted many research focuses,due to its wide applications in different areas.However,in several real classification applications,such as bio-informatics and bio-medicine users pay more attention to the special region of AUC.The traditional classification algorithms do not match the users' intention,since they try to optimize the accuracy.To tackle the issue,recently,many researchers proposed to construct the classifiers by directly optimizing the metric of PAUC(Partial AUC),and obtain the models with promising performances.The most existing maximizing PAUC classification algorithms are based on batch learning,and when they are applied into the applications with large data,their training efficiency is low.To this end,this thesis aim to design the efficient algorithm that direct optimization of PAUC,and the main contributions of this thesis are summarized as follows:(1)This thesis first introduces the commonly used optimization algorithms of binary classification,and analyzes the general evaluation criteria for the binary classification.Then we focus on the AUC and PAUC measures,since PAUC is a modification of AUC.After review the related work on direct optimization of PAUC,we propose to design the efficient algorithms for maximizing PAUC under big data.(2)An online algorithm for maximizing PAUC is suggested in this thesis.Compared with the existing state-of-the-arts,the suggested algorithm has a faster convergence rate.By combining the idea of online learning,this algorithm can improve the efficiency of direct optimization PAUC algorithm,and make it more suitable for large-scale data application environment.To be specific,Firstly,the new objective function that oriented to PAUC is defined,and in the process of algorithm implementation,SC-RMSProp strategy is integrated to achieve convergence speed of O(logT/T).At the same time,by combining the "Top K" strategy,this algorithm is effectively adapted to the attention of some samples in the PAUC evaluation standard,making more related samples involved in training and providing a guarantee for better accuracy.Experiments on large data sets show that this algorithm can solve these problems effectively.(3)In addition,in this thesis,a direct optimization of PAUC classification algorithm based on stochastic learning is proposed from the angle of stochastic learning.The application of stochastic learning ensures that the algorithm can efficiently solve the problem of large-scale data classification.Specifically,in the proposed algorithm,the objective function based on stochastic learning is firstly defined.At the same time,positive and negative samples are randomly selected from the buffer during each iteration to ensure more representative sample pairs.To further improve the performance,a feature-wise update strategy,named Adagrad is also introduced,which makes the algorithm obtain an adaptive step that is suitable for each dimension feature of the sample,and make full use of the historical gradient information.By using it,not only the value of PAUC of the proposed algorithm is improved,but also the sensitivity of the algorithm is reduced.
Keywords/Search Tags:Imbalance binary classification, PAUC, AUC, Online learning, Stochastic learning
PDF Full Text Request
Related items