Font Size: a A A

Research On Logistic Regression Learning Algorithm For Imbalanced Problem

Posted on:2018-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:G R ZhengFull Text:PDF
GTID:2348330515952087Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
Imbalanced classification is one of the crucial issues in the filed of machine learning and pattern recognition,which is characterized as one class instance significantly rather than another.In many real-world applications,the correct prediction of examples in minority class is often more meaningful than the contrary case.For example,rare patients may have cancer in cancer detection,and how to effectively recognize cancer patients is very meaningful.As a classic statistical classification method----Logistic regression tries to achieve high accuracy by assuming that the number of examples in any class is similar to each other,which leads to the fact that the minority class examples are often overlooked and misclassified to majority class.To solve this problem,this paper proposes two methods to improve the classification performance of logistic regression in imbalanced.The innovations are as follows:(1)Logistic Regression for Imbalanced Learning.Traditional logistic regression algorithm adopts MLE(Maximum Likelihood Evaluation)method to evaluate the model parameter.However,it is difficult to recognize the features of minority class.To solve this problem,a novel method called MLER(MLE and Recall)is introduced in this paper.Different with MLE,MLER can take into account the accuracy and recall rate of the model at the same time,ensuring the performance of the model in all classes.Based on MLER,the new method LRIL(Logistic Regression for Imbalanced learning)was designed to figure out the imbalanced.The results of the experiment on the UCI dataset reveal that,compared with the traditional logistic regression,under the premise of maintaining the high accuracy of logistic regression,LRIL can effectively improves the recall rate,f-measure and g-mean.Besides,LRIL has a better performance than under-sampled and over-sampled logistic regression.(2)Imbalanced Learning Based on k-means and Logistic Regression.Since traditional classification method does not work well on imbalanced class,we combine k-means with logistic regression model,and proposes a novel method named ILKLR(Imbalanced Learning based on k-means and Logistic Regression)for imbalanced problem.Firstly,ILKLR applies clustering method to divide majority class into smallclusters to rebalance the dataset for the learning of logistic regression model.The experiments on UCI data sets show that the proposed method has a significant superiority on measurement of recall,g-mean and f-measure when compared with logistic regression,under-sampled and over-sampled logistic regression.
Keywords/Search Tags:logistic regression, imbalanced, classification, recall, g-mean, f-measure
PDF Full Text Request
Related items