Research And Application Of Imbalanced Data Processing Algorithm

Posted on:2020-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yu

Full Text:PDF

GTID:2428330590979101

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of computer science and electronic communication technology,we have entered the era of big data.The explosive growth of the amount and types of raw data has made all walks of life have an urgent need for the technology of data processing,which has also provided tremendous opportunities for the development of data mining and machine learning.In many realistic situations,traditional algorithms are based on the balanced class distribution of data sets and the equal cost of misclassification.However,the data that we have to process is usually imbalanced.And these situations include fingerprint recognition,face recognition,facial age estimation and so on.Therefore,the research on classification algorithms for imbalanced data has become a hot topic in the field of machine learning and data mining.This paper mainly studies the imbalanced data processing algorithm,and carries out the research work from the following three aspects:First of all,traditional algorithms usually only take the spatial distribution of data into consideration while ignoring spatial distance when dealing with imbalanced data.To address this shortcoming,a novel integration method based on K-means and the improved MaxDistance rule is proposed.This method combines the characteristics of spatial distribution and spatial distance of the original data,and transforms the problem of two kinds of imbalanced data into an equilibrium problem without losing any useful information or adding any artificial data.Compared with the existing processing methods for two kinds of imbalanced data,the experimental results prove that the method proposed in this paper has better performance on the same public standard data set.Secondly,an under-sampling method based on the combination of feature weight and clustering method is proposed,which is called the Uscfk algorithm.In order to improve the performance of the classification for the imbalanced data,this method increases the weight value of features that have a large impact on the classification result and decreases the weight value of features that have a small impact on the classification result.So that this method can be used in combination with K-Means algorithm to sample the most suitable data of different kinds for classification.Specifically,this method is proposed to optimize the assignment method of feature weight.In this way,suitable samples that are more conducive to the classification decisions will be sampled.As a result,a novel classification model for imbalanced data is constructed based on the combination of feature weight assignment method and clustering method.Finally,an experiment was conducted on the KEEL data set to prove the effectiveness of the integration algorithm,and the results verified that the proposed method improved the performance of classification for imbalanced data.In the last part of this paper,we test the proposed model on the public standard data set of wine.Compared with the results of traditional algorithms,the proposed algorithm can effectively improve the accuracy of classification for imbalanced data,and its application in wine classification also shows good performance.

Keywords/Search Tags:

machine learning, imbalanced data, clustering method, ensemble method, sampling method

PDF Full Text Request

Related items

1	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
2	Research On Decision Tree Classification Method Of Imbalanced Data Based On Reinforcement Learning
3	Research On Binary Imbalanced Large Data Classification And Its Application
4	Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method
5	Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data
6	Research On Unbalanced Learning Based On Sampling Method
7	Data Distribution-driven Adaptive Hybrid Sampling Method For Imbalanced Data Processing
8	Research Of Imbalanced Data Classification Method Based On Oversampling And Ensemble Learning
9	Research On Ensemble Method To Imbalanced Data Classification With Reinforcement Learning Mechanism
10	A Study Of Ensemble Learning Method For Imbalanced Data Classification And Its Applications