Font Size: a A A

The Algorithm Research Of Associative Classification And Classification Based On Imbalanced Data

Posted on:2017-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:W P WangFull Text:PDF
GTID:2348330485956513Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is one of the most significant directions in the field of data mining,which aims at establishing a classifier or model to predict the class labels of unknown objects.Classification algorithms based on association rules have characteristics of far more rules and good classification accuracy.The schema with support and confidence is commonly employed in the state-of-the-art associative classification methods.However,this associative classification algorithm fails to measure the correlation between item-set and class,as well as ignores the proportional relation among different classes.Consequently,they have poor performance in the skewed class distributions of training data.Furthermore,Imbalanced data classification problem has been paid great attention on data mining.The imbalanced class distribution means that the number of cases of one class(minority or positive class)is far less than the others(majority or negative class).For imbalanced data,the conventional classification techniques attempt to maximize the overall classification accuracy.This leads to produce biased classifiers that have a higher predictive accuracy over the majority classes,but poorer predictive accuracy over the minority cases.In our real-world,minority class samples are of great interest,and the cost of misclassifying a minority class sample as majority class can be very high.Therefore,it is a great challenge for class imbalance learning to improve the predictive accuracy over the minority class without sacrificing global predictive accuracy.The main research work in this paper is described as follows:Firstly,we propose an improved associative classification approach based on support and enhancement ratio(ACSER).This algorithm is modified by classical associative classification approaches based on support and confidence.ACSER extracts candidate classification rules which are frequent enhancement ratio patterns from training data.Meanwhile ACSER sorts and prunes the extracted rule set according to a new definition of rule intensity by integrating confidence and enhancement ratio.By the above two improved points,ACSER has more reasonable rule priority and achieves greater classification accuracy.Secondly,we propose a new associative classification algorithm ACIW based on instance-weighted for imbalanced data.ACIW increases the weight of each minority class case according to the distance between corresponding minority class and all majority class cases in original imbalanced data.With the weight assignment of minority classes,themeasures of item sets belonging to the minority class can be improved considerably,especially for those which are harder to learn.ACIW next adopts the proposed ACSER to train the weighted data set and construct the classifier.Experimental results show that ACIW can not only improve the number and priority of minority class rules,but also obtain higher recognition rate of minor samples in the case that get promising global predictive accuracy.Finally,we propose another new imbalanced data learning algorithm ASMOTE-Boost,based on a combination of adaptive synthetic minority over-sampling technique(ASMOTE)and the ensemble classification.Using k-NN method,this algorithm identifies and filters the noise minority samples.Then it takes into full account the distribution features of filtered training samples.Moreover,adaptive synthetic rate of a minor case depends on the level of its learning difficulty.In other words,the harder a minor case is to learn,the higher its synthetic rate is given.By a large number of experiments verifying,the results indicate that ASMOTE–Boost performs better than several other methods,and is effective and feasible to deal with the issue of imbalanced data classification.
Keywords/Search Tags:Data mining, Classification, Imbalanced data, Association rules, Instance weights, Over-sampling, Ensemble learning
PDF Full Text Request
Related items