Font Size: a A A

The Researcher Of Rare-Class Problem Based On Logistic Discrimination

Posted on:2016-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:H T MaoFull Text:PDF
GTID:2298330467495356Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
Rare-class problem is also named imbalanced problem. The characters of the problem is that the instances of one class (majority class or negative class) is many more than that of the other (minority class or positive class). In applications, the correct identification of instances in rare class is more valuable than the contrary case. However, conventional classification methods try to pursue high accuracy by assuming that the number of instances in any class is similar to each other, leading to the fact that the rare class instances are often overlooked and misclassified to majority classesMany proposed approaches to deal with this problem can be categorized to two groups:data level and algorithm level approaches. For the former, learned models are constructed on the re-balance class distribution by resampling training data set. With respect to the latter, solutions try to adapt existing classifier learning algorithms to bias towards the rare class.Based on LD (Logistic Discrimination), we provide a novel method called LDRC (LD based Rare-class Classification) to enhance the generalization performance of LD on rare-class problem. Take full use of the character of rare-class, we construct a new objective function MRP (Metric based on Recall and Precision) which take into account the recall of both positive and negative class as well as the precision of positive class. The recall of both positive and negative class guarantee LDRC has a better generalization performance on recall and g-mean while the precision and recall of positive class ensure LDRC has better generalization performance on accuracy and f-measure. LDRC learn the parameter with the objective function MRP to get better performance. The experiments on UCI data sets show that the proposed method presents significant advantage comparing to LD, LD based on Under-Sample and Over-Sample on measures of recall, g-mean and f-measure.
Keywords/Search Tags:rare-class, logistic discrimination, recall, precision, classification
PDF Full Text Request
Related items