Font Size: a A A

The Study Of Association Rule Based Classification For Imbalanced Data

Posted on:2016-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:X J CuiFull Text:PDF
GTID:2308330461978626Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The association rule based classification is also called as Associative Classification (AC) method. It is a very important field in data mining research. Because of its advantages such as strong interpretation ability and high classification accuracy, AC has become a hot topic in the intelligent decision-making field. However, the emergence of imbalanced data has brought a challenge for AC method. Imbalanced data is that with imbalanced class distribution, which means that the records belonging to minority class is far less than those of majority class. In practice, there are a lot of applications with imbalanced data sets, such as intrusion detection, forest fire forecasting, credit fraud and so on. In such applications, people care more about the classification accuracy of the minority class, because their misclassification cost is very large. Hence, it is very essential to enhance the classification accuracy, especially that of the minority class.Traditional AC algorithms can’t deal well with imbalanced data. There are two main reasons:Firstly, the Interestingness Measures (IM) used in AC process are based on "confidence-support". When applied for imbalanced data, the "confidence-support" based AC methods may generate few rules relative with the minority class or abundant useless rules. IM plays an important role and participates in rule generation, pruning and ranking phases to get interesting rules. So it is very essential to select appropriate IMs for AC in dealing with imbalanced data. Secondly, for data with imbalanced class distribution, the classifier generated by AC method always tends to predict a test object as the majority class and be easy to ignore the affect of the minority class, which can result into classification rules with poor quality. For the above reasons, this paper conducts the research from the following two aspects:(1) For Interestingness Measure used in AC process, the study is to find those measures proper for imbalanced data classification, which can help to improve the AC’s ability to deal with imbalanced data. On one hand, this paper puts forward Stable Strongly Correlated Measures Mining method to mine the strongly correlated measures in most cases and compare behaviors. On the other hand, in order to select all the good measures, this paper realizes ranking all the measures under each class distribution based on their classification performances. After selection and behavior analysis, all the good IMs can be obtained and classified into two groups with different characteristics. (2) In data and rule processing, the research is to ensure the classification association rules’ quality, which can finally improve the performance of AC on imbalanced data. Firstly, from data aspect, this paper proposes Key Value Sampling (KVS) technique to sample the original imbalanced data and achieve the class balance by removing the instances weakly correlated with majority class and increasing those strongly correlated with minority class. Secondly, from rule aspect, this paper proposes the Rule Validation which makes full use of AC method’s advantage that one can update or tune a rule without affecting other rules. This method is to validate the initially generated classifier and improve the rules with bad performances, which can enhance the whole classifier’s performance.To sum up, this paper aims at improving AC’s performance from selecting superior IMs and improving traditional algorithm. All of the two aspects can improve AC’s performance in dealing with imbalanced data. Through experiment analysis, we can show the effectiveness of the above methods and reliability of the conclusion.
Keywords/Search Tags:Associative Classification, Imbalanced Data, Interestingness MeasureSelection, Data Sampling, Rule Validation
PDF Full Text Request
Related items