Font Size: a A A

Kernel Logistic Regression For Imbalanced Data Classification

Posted on:2016-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2348330488973884Subject:Engineering
Abstract/Summary:PDF Full Text Request
Classification is a prevalent task that is required in numerous fields such as medical diagnosis, oil detection, credit evaluation and so on. Recently the difficulty of imbalanced data classification has attracted lots of attention, and there are many studies focus on solving the problem. One of the difference between traditional classification and imbalanced data classification is that the traditional evaluation criterion such as accuracy cannot clearly show the classification performance. Therefore, a confusion matrix of classification results is introduced and generate several evaluation criteria such as sensitivity, specificity, positive predictive value, negative predictive value and some comprehensive criteria such as F-measure and receiver operating characteristic curve (ROC curve).Utilizing kernel function in logistic regression can generate the kernel logistic regression (KLR). Thanks to logistic regression and kernel function, it has both merits that provide firstly non-linear boundary and secondly the posterior probabilities of classes. The important part of implying KLR on imbalance data classification is not only the setting of parameter by the optimization of a proper objective function, but also how to set hyperparameters, include the parameter of kernel function, the weight of regularization term, and the bias of discriminant function.For the sake of improving imbalanced data classification performance, it is important to find a proper way to adjust and set the hyperparameters. In this study, we proposed a confusion matrix-based evaluation criterion Harmonic Mean (HM) as evaluation criterion, and utilized grid search method and cross-validation to set these hyperparameters of KLR. In order to evaluate this KLR model, we compared its classification performance with support vector machine (SVM) using several benchmark datasets which have various ratios of imbalance. At the first stage of our experiment, we used the harmonic mean of four evaluation criteria to evaluate the effectiveness of KLR. Then we emphasized two evaluation criteria which have cardinal importance in particular applications. The experimental results show that in most cases KLR achieved high values of evaluation criteria than SVM on the benchmark datasets. It implies that KLR performed well and had good generalization ability on several imbalance datasets, and can be a good choice in combination with other method such as resampling, cost-sensitive learning and so on, to enhance the imbalanced data classification performance.
Keywords/Search Tags:imbalanced data, kernel method, logistic regression, confusion matrix, hyperparameter
PDF Full Text Request
Related items