Font Size: a A A

Research On Contrast Pattern-based Classification For Imbalanced Data

Posted on:2019-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y J GaoFull Text:PDF
GTID:2428330545973851Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the emergence of large of data,data mining has become one of the most valuable areas of application.Data mining technology extracts effective information from the unstructured data to make wiser decision.Imbalanced data classification is a special and important task of data mining.The imbalance problem is that there exist significantly fewer objects belonging to minority class than those of the majority class and distribution of objects is skewed.This phenomenon has been widely used in many applications,such as online banking fraud detection,face recognition,forecasting of ozone levels,prediction of liver and pancreas disorders,and so on.In the data imbalance problem,the minority class is important in data mining and take with a lot of important and useful information.Misclassification of minority can bring great losses.Therefore,it is necessary to enhance the model performance on minority examples which has important theoretical significance and application prospects.Contrast pattern is a kind of contrast patterns with the good distinguishing ability.Contrast pattern-based classifiers become more understandable and accurate on binary classification which analyzed from the internal characteristics of the sample.However,these classifiers do not achieve good performance on class imbalance problems.Thus,this paper introduces a new contrast pattern-based classifier for class imbalance problems.And we extend it to multi-class imbalance problems.Then a new decomposition strategy is proposed and applied to the classification based on contrast patterns for imbalance problem.The main work of this article is as follows:(1)Miners usually extract a very large collection of patterns from a dataset.And there are same contrast patterns that have little effect on the classification task.A large number of contrast patterns will reduce the efficiency of classification.In addition,in class imbalance problems,the contrast pattern miners extract several patterns with high support for the majority class and only a few patterns with low support for the minority class.It makes that the sum of the supports of contrast patterns for the majority class is much larger than the sum of supports of contrast patterns for the minority class,and the classification result will be biased toward the majority class.Therefore,we present a new contrast pattern-based classifier for class imbalance problem.The proposed method selects the appropriate contrast patterns by quality measures.Then we combine the quality measure of the pattern and class confidence proportion with the class imbalance level at the classification stage of the model.Experimental results show that the classifier constructed by the high-quality contrast patterns has better performance,and the article algorithm can effectively reduce the bias toward the majority class and improve the recognition ability of minority class.(2)For imbalanced multi-class data classification,the data features and distribution characteristics between classes are more complex.The traditional decomposition method tends to aggravate the degree of imbalance in the sub-collections and generate a large number of sub-collections which result in time consumption.Therefore,this paper proposes a hierarchical cluster-based framework for multi-class imbalance.This method decomposes multi-class imbalanced data based on the idea of hierarchical clustering,and we reduce the degree of imbalance between classes according to the similarity of classes.At the same time,hierarchical decomposition is used to effectively control the complexity of the sub-sets.In the classification phase,we use the contrast pattern-based classifier for class imbalance problem and make use of the internal-characteristics of data to reduce the bias of classifiers for majority class effectively.Experiment results show that the method based on similarity for decomposition can reduce the degree of imbalance among sub-sets effectively.At the same time,the decomposition strategy is applied to the contrast pattern-based classifier for class imbalance problem to improve the classification performance,especially in the minority class.
Keywords/Search Tags:Data Mining, Class Imbalance, Contrast Pattern, Quality Measures, Multi-class Imbalance Problems
PDF Full Text Request
Related items