Font Size: a A A

The Algorithm Research Of Contrast Patterns Mining Based On Imbalanced Datasets

Posted on:2017-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z LiuFull Text:PDF
GTID:2428330488971851Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining refers to the knowledge discovery in database,which plays a more important role in the process of information technology development.Contrast Patterns(Contrast Patterns)is a core content in the field of data mining,which lays the foundation for the classification,clustering,association rules and other data mining tasks.Contrast Patterns is the patterns that support degrees makes great changes in the two datasets,it exhibits a strong data distinguish ability,using the distinguish features build pattern classifier often has better classification results than some simple classifier.However,at present,most of the contrast patterns mining algorithms are based on the two assumptions,the first,that the number of training data set in each category is roughly the same.It is often deal with imbalanced data sets by using the hypothesis that the class distribution is balanced.The information of rare class data is covered by the information of large class of data,and the result is not very good.Second,all of the samples are from the same area.When the training data set and test data set come from different fields,it is very expensive to recollect the training data set and re construct the classification model.This paper analyzes the above assumptions,mainly to do the following research work:(1)In this paper,we addressed the impact of the relationship between the majority class and the minority class,proposed a new contrast patterns called BEPs,established the basis of the contrast patterns mining on the imbalanced data sets.Then,for the imbalanced data set,we establish a sliding window to split the data sets and reduce the size of data imbalance ratio;improve the generalization ability of the mining model.In the mining process,we fixed the minority class samples in the window,meanwhile,we let the majority class samples flow across the window,to constitute some sub data sets with the minority class data,and the sub data sets'imbalance ratio is relatively flat.In the window,we adopt based on the sorted frequent pattern tree structure to mining the new contrast pattern.As the majority class samples flowing,it will form many windows,we use the window data to mining the contrast patterns,while building some sub classifiers with the contrast patterns,until the end of window slipping.(2)The shared balanced emerging patterns(SBEPs)realization the migration between the multi domain imbalanced data sets,this paper proposes an algorithm based on SBEPs to measure the similarity of imbalanced data sets.The algorithm gives the calculation method of similar quality of SBEPs,including the whole similar quality and average similar quality;then,the SBEPs quantity is standardized.aggregate the contribution of SBEPs to measure the similarity of datasets.The classification experimental results show that the classifier accuracy is high when using the high similarity datasets to be auxiliary data.
Keywords/Search Tags:Data Mining, Contrast Pattern, Imbalanced Datasets, Similarity Measure
PDF Full Text Request
Related items