Font Size: a A A

Research On Construction Algorithm Of Sparse Model For Imbalanced Binary Classification

Posted on:2018-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330515492888Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the progress of society and the development of science,peoples' lives have been changed greatly.At the same time,more and more data is accumulated,which brings convenience as well as challenge to the people.How to find useful information from this large amount of data becomes an urgent problem to be solved.The emergence of machine learning provides an effective way to tackle the problem above.In the literature of machine learning,classification,in particular,the binary classification is one of the hottest researches,due to its wide applications.However,in the real life,many applications such as web search engine or personalized recommendation system are based on the binary classification with imbalance data.What's more,those applications often have the characteristics of high data dimension.The existing balanced binary classification algorithms can not be direct applied.To fill the gap,recently,some researchers proposed to construct the sparse model with direct optimization of imbalanced measures.These works are worthwhile in the endeavor of achieving competitive generalization performance.Nevertheless,most of them are focused on the metrics which are easy to be decomposed such as AUC or F1,and for the complicated measures,there are very few works.Based on this,this thesis tough on the issue of designing sparse model for the complex imbalanced measures.The main contributions of the thesis can be summarized as follows:(1)First of all,we begin our work with the introduction of binary classification,and present the comparison between balanced and imbalanced measures.Then we analyze the current researches on the binary classification oriented to imbalanced metrics,especially the topic of construction algorithms for the sparse models.On the basis of it,we propose to design the algorithms for sparse models oriented to complex imbalanced measures.(2)Secondly,different from the existing algorithms,which mainly focused on AUC or F1 that have the property of easy decomposition.We take QM as an example,and present a construction algorithm of sparse model for QM.More specifically,we first define the objective function with QM.For the problem that the new function is non-smooth,and difficult to be optimized,we propose to use the cutting plane algorithm,which can not only solve the problem,but also have the iterations of O(1/?).The experimental results on the imbalanced benchmark data sets show thatwhen measured by QM,the algorithm we proposed can obtain effective sparse model with high accuracy.(3)Lastly,from the perspective of learning method,the current algorithms for the imbalanced sparse model all adopt batch learning,which may have high computational efficiency and are not suitable for the large scale applications.To tackle the problem,we present a construction algorithm based on stochastic learning.Different from existing works that designed algorithms only oriented to one measure,the algorithm we proposed can be applied to a family of the imbalanced measures.More specifically,we focus on the family of pseudo-linear metrics,and begin our work with transforming the optimizing pseudo-linear metrics into a cost-sensitive problem.Then we adopt the COMID(Composite Objective Mirror Descent)method as the inner optimization algorithm,which can obtain effective sparse model with comparable results.For the problem that original COMID algorithm only achieve O(log T/T)convergence rate even for the strongly convex objective function,we present a simple modified algorithm based on polynomial-decay averaging,and prove that new algorithm can not only obtain O(1/T)optimal rate but also compute on the fly.Empirical evaluations on the large scale imbalanced data sets demonstrate efficient and the effective of the proposed algorithms.
Keywords/Search Tags:Imbalanced binary classification, QM, Sparsity, Cutting plane algorithm, COMID, Pseudo linear
PDF Full Text Request
Related items