Font Size: a A A

Research On Resampling Technology Based On Fuzzy Knowledge

Posted on:2022-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:M X JiangFull Text:PDF
GTID:2480306605971209Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the advent of the "big data" era,data mining has become a research hotspot.Researchers have proposed many effective data mining methods from different perspectives,and achieved excellent research results,but they have also encountered many challenges at the same time.The problem of class imbalance is one of them.In order to solve this problem,many methods based on algorithm level have been proposed.These methods are mainly to improve the traditional machine learning algorithm to increase the attention to the minority class,or to impose a higher penalty on the minority class to reduce the classifier's preference for the majority class.But the improvement of the classification performance of the algorithm is not obvious.Ensemble learning,as a method to enhance the performance of a single classifier,is an effective method to deal with imbalanced data.Based on the Boosting and Bagging frameworks,this thesis proposes two unique ensemble algorithms.Aiming at the problem that a single algorithm does not perform well in classification on imbalanced dataset,this thesis proposes a boosting algorithm based on fuzzy entropy and fuzzy support.It is an ensemble algorithm based on the Boosting framework.It realizes the fusion of data resampling and classifier training,and expands the field of data preprocessing.Firstly,the algorithm constructs the class global entropy.Secondly,according to the class global entropy of the majority class samples,the area where all the majority class samples are located is divided into safe areas or boundary areas,and the density peak-based clustering algorithm is used to select representative samples of the safe area to complete static resampling.Thirdly,the Boosting classifier is trained.Before each iteration of the classifier,the trained classifier is used to calculate the average class support of majority samples,and combine it with the class global entropy to realize undersampling again.Finally,in order to verify the effectiveness of the algorithm,comparison experiments with the traditional ensemble algorithm are carried out on 9 artificial data sets and 34 real data sets.The experimental results show that the new algorithm is significantly better than other algorithms.In order to overcome the shortcomings of resampling at the data level and reduce the training cost of the ensemble classifier,this thesis also proposes another method based on the algorithm level,the oversampling algorithm based on clustering and random forest.It is a method of "oversampling" the classifier based on the Bagging framework.It first performs static undersampling on the data set.Then,it identifies the key regions of the data set,and trains the classifiers on the original data and the key regions respectively.It adjusts the proportion of the number of classifiers according to the weights.Finally,a comparative experiment analysis are carried out on 15 artificial datasets and 9 KEEL datasets.The experimental results show that this method is better than the oversampling method at the data level,and the effect is better than the traditional ensemble algorithm.
Keywords/Search Tags:Class imbalance, Fuzzy entropy, Fuzzy support, Ensemble algorithm, Oversampling algorithm
PDF Full Text Request
Related items