Research On Classification Algorithm For Imbalanced Data

Posted on:2021-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:P Zhou

Full Text:PDF

GTID:2428330602964570

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the rapid development of the information technology and Internet era,the amount of data in various fields is increasing rapidly at an unprecedented rate.How to achieve intelligent data processing and how to extract valuable information contained in data has become a research and application hotspot in the field of machine learning and data mining.Data classification,as an important subject in the field of data mining,has been widely used in data analysis and intelligent processing.Traditional classification methods can achieve satisfactory results when faced with a balanced dataset.However,in real life,the commonly used data sets are imbalanced,and traditional classification algorithms cannot guarantee the classification effect of the minority samples when facing imbalanced data sets.In this paper,the classification algorithm for imbalanced datasets will be studied from the data level and the algorithm level:(1)At the data level,a weighted bi-directional sampling based on k-means method for imbalanced datasets(WBSK)is proposed.The proposed method firstly uses K-means to cluster the whole data set,then oversamples the data set in some regions with a large number of minority class according to the imbalance ratio by different weights of each cluster,avoids the generation of noise and effectively overcomes imbalances between and within classes.Finally,undersampling the clusters with a large number of minority class to balance the sample number of the whole data set.The experimental results obtained from 11 datasets show that the proposed method is superior to other methods under different classifiers and evaluation criteria.(2)At the algorithm level,a fixed-radius nearest neighbor Progressive competition algorithm(FRNNPC)is proposed.As a preconditioning,FRNNPC eliminates ineligible samples globally through the Fixed-radius nearest neighbor rule,and use the NPC in the obtained candidate data to gradually calculate the score of thenearest neighbor sample of the query sample until the sum of the scores of the one class is higher than another class.In short,this method can effectively deal with the imbalance problem,and does not require any manually set parameters.The experimental results compare the proposed method with the other representative algorithms applied to 10 imbalanced data sets,and illustrate the effectiveness of the algorithm.

Keywords/Search Tags:

Oversampling, Undersampling, clustering, imbalanced dataset, nearest neighbor rule

PDF Full Text Request

Related items

1	Research On Imbalanced Dataset Classification Based On Oversampling Technique
2	Research Of Imbalanced Datasets Preprocessing Combined With Clustering
3	Research And Application Of Equalization Method For Imbalanced Dataset
4	Research On Under-sampling Algorithm For Imbalanced Data Based On Clustering And Its Application
5	Research On Neighborhood-aware Imbalanced Data Sampling Classification
6	Random K-Nearest Neighbor Algorithm With Application To Bankruptcy Prediction
7	Research On The Key Technologies And The Applications For The Class Of Imbalance Problem
8	Research On Imbalanced Data Undersampling Classification Based On Constructive Covering
9	Imbalanced Classification Methods For Complex Distribution Characteristics
10	Research On Optimal-Nearest-Neighbor And Reverse Visible Nearest Neighbor Queries