Multi-threshold Based Contrast Pattern Mining And Its Application In Classification Of Imbalanced Datasets

Posted on:2020-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y G Lan

Full Text:PDF

GTID:2428330623951420

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Data mining is one of the hottest areas of research in the 21 st century.With the continuous development of data mining technology,people have begun to be able to extract knowledge that is easy to understand and easy to store from irregular data.Traditional imbalanced dataset comparison mode mining algorithms are often based on a single support threshold.As the size of the data set increases,some items with strong discrimination but low support may be lost.These items are based on The classification algorithm of the comparison mode will improve the performance of the classification to some extent when classifying the samples.In addition,the traditional contrast mode mining algorithm is generally based on a single machine operation,with linear execution characteristics,due to the limitations of CPU,memory and other aspects of a single machine,the traditional contrast mode began to bottleneck,especially in the face of today's data When the scale is large and the data dimensions are high,the traditional contrast pattern mining algorithm has the characteristics of low efficiency and low scalability.Although the traditional unbalanced data set comparison mode classification algorithm can solve the class bias problem to some extent,these algorithms often have the following defects.First,when the size of the data set is large,the traditional contrast mode classification algorithm still has a class bias problem when classifying due to the excessive dependence on the mode support degree.In addition,the traditional contrast pattern mining algorithm tends to weight the classified samples according to the support degree of the pattern,and the result of the classification is too much affected by the mode support degree.In response to the above problems,this paper mainly does the following work:(1)This paper proposes a parallel mining algorithm based on multi-support threshold.The algorithm can dynamically select a reasonable support for the item set according to the frequency count of the item set in the comparison pattern mining process,so that the traditional comparison pattern mining algorithm may filter out some items with certain discrimination.Set of questions.In addition,the algorithm can be based on MapReduce-like batch processing such as Spark to process large-scale,high-dimensional data sets.The basic idea is to divide the mined space into small,independent units.Since these independent units do not have interdependencies,Therefore,it can be mined in parallel.This paper experiments on multiple UCI datasets.The experimental results show that the proposed contrast mining algorithm can mine more discriminative contrast modes,which can improve the classification accuracy of the experiment in the post-sequencing experiments.In addition,the algorithm can mine large-scale data in a relatively efficient time,and can reduce the mining time by increasing the number of computing nodes,and has good scalability.(2)Based on the imbalance of data set IR,this paper proposes a comparison mode classification algorithm based on the reward and punishment coefficient of unbalanced data sets.The algorithm can calculate the reward and punishment coefficient of the sample according to the imbalance degree IR of the class,and calculate the classification score of the sample in different categories through the reward and punishment coefficient,which can overcome the defects of the traditional unbalanced contrast mode classification algorithm to some extent.

Keywords/Search Tags:

data mining, classification, parallel mining, contrast pattern mining, unbalanced data set

PDF Full Text Request

Related items

1	Study And Implementation On Techniques Of Parallel Mining Of Frequent Closed Sequences Based On Vertical Segmentation
2	The Research And Implement Of Algorithm On Web Usage Mining
3	Research On Classification Algorithms Of Data Mining Based On Imbalanced Data Sets
4	Design Of Frequent Pattern Mining Algorithm LPS-Miner And Research On Parallel Formulations
5	Research And Application Of Mining Access Sequential Pattern In Weblog
6	Research On SVM Classification Of Unbalanced Data And Its Application In Identify Poor Students In Colleges And Universities
7	Study On Several Typical Data Mining Methods And Their Applications
8	A Multi-flow Streaming Data Fre Quent Pattern Mining Adaptive Algorithm
9	The Research Of Conditional Discriminative Pattern Mining Algorithms
10	Research On Contrast Pattern-based Classification For Imbalanced Data