Kernel Logistic Regression For Imbalanced Data Classification

Posted on:2016-12-15

Degree:Master

Type:Thesis

Country:China

Candidate:P Wang

Full Text:PDF

GTID:2348330488973884

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Classification is a prevalent task that is required in numerous fields such as medical diagnosis, oil detection, credit evaluation and so on. Recently the difficulty of imbalanced data classification has attracted lots of attention, and there are many studies focus on solving the problem. One of the difference between traditional classification and imbalanced data classification is that the traditional evaluation criterion such as accuracy cannot clearly show the classification performance. Therefore, a confusion matrix of classification results is introduced and generate several evaluation criteria such as sensitivity, specificity, positive predictive value, negative predictive value and some comprehensive criteria such as F-measure and receiver operating characteristic curve (ROC curve).Utilizing kernel function in logistic regression can generate the kernel logistic regression (KLR). Thanks to logistic regression and kernel function, it has both merits that provide firstly non-linear boundary and secondly the posterior probabilities of classes. The important part of implying KLR on imbalance data classification is not only the setting of parameter by the optimization of a proper objective function, but also how to set hyperparameters, include the parameter of kernel function, the weight of regularization term, and the bias of discriminant function.For the sake of improving imbalanced data classification performance, it is important to find a proper way to adjust and set the hyperparameters. In this study, we proposed a confusion matrix-based evaluation criterion Harmonic Mean (HM) as evaluation criterion, and utilized grid search method and cross-validation to set these hyperparameters of KLR. In order to evaluate this KLR model, we compared its classification performance with support vector machine (SVM) using several benchmark datasets which have various ratios of imbalance. At the first stage of our experiment, we used the harmonic mean of four evaluation criteria to evaluate the effectiveness of KLR. Then we emphasized two evaluation criteria which have cardinal importance in particular applications. The experimental results show that in most cases KLR achieved high values of evaluation criteria than SVM on the benchmark datasets. It implies that KLR performed well and had good generalization ability on several imbalance datasets, and can be a good choice in combination with other method such as resampling, cost-sensitive learning and so on, to enhance the imbalanced data classification performance.

Keywords/Search Tags:

imbalanced data, kernel method, logistic regression, confusion matrix, hyperparameter

PDF Full Text Request

Related items

1	Research On Logistic Regression Learning Algorithm For Imbalanced Problem
2	The Application Of Data Mining Methods In Credit Card Default Prediction
3	Research On The Prediction Method For Imbalance Data Set
4	Robust Low Rank Matrix Recovery And Application Of Sparse Logistic Regression Model
5	Imbalanced Data Learning Based On Kernel Methods
6	Class Imbalance Oriented Logistic Regression
7	Research Of Speaker Identification Models Based On Kernel Methods
8	Research On Classification Method Of Imbalanced Data Set Based On Generative Adversarial Network
9	Research On The Prediction Of Insurance Payment Based On Logistic Regression Model
10	Research On The Method Of Classifier Selection Integration Based On Confusion Matrix