Research On Virtual Sample Generation Technology Based Of KDE And Copula Function And Its Application To Imbalanced Dataset Classification

Posted on:2020-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:S X Wang

Full Text:PDF

GTID:2428330602461510

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

The propensity of different types of data and the information gap between samples largely restrict the accuracy and rationality of these classification algorithms.How to solve the unbalanced samples problem reasonably and effectively,and improve the classification performance of classification algorithm is a hot topic.The main reason for the data imbalance problem is the difference in the amount of samples of different categories,which leads to the offset of the decision surface of the classifier,and the lack of information in the original data space leads to insufficient feature learning of the classifier.The virtual sample technology can effectively solve the problem of the deviation of the decision planet caused by the difference amounts of the categories in the unbalanced classification problem,and can effectively fill the information interval of the original data.In the traditional unbalanced sample solution strategy,the virtual sample construction method is only a linear combination between the original samples,and resulting the data feature cannot describe the samples correctly.Therefore,this paper proposes a virtual sample generation method Copula-KDE VSG which based on Kermel Density Estimation(KDE)and Copula function to solve the problem of data skew and information loss in unbalanced classification problems.Using the kermel density estimation,the joint probability density model is constructed by estimating the marginal probability density of each dimension of data and constructing the copula flunction.New virtual samples are generated according to the constructed joint probability density,and the virtual samples are further optimized by pseudo-marking technique.An improved Copula-KDE VSG method is proposed,and the method is proved to further enhance the reasonable generation of virtual samples.The Copula-KDE VSG can generate virtual samples that match the characteristics of the original samples and effectively fill the information interval of the samples,thereby improving the classifier's learning ability of positive samples.In this paper,two actual cases(nuclear protein localization data and banknote wavelet transform data)are used to compare SMOTE method and its improved method cluster-SMOTE with proposed method under four classifiers.And they prove that the method proposed in the article is effective,practical and advanced.The experimental results show that the virtual samples generated by this method can effectively retain the feature information of the original samples and supplement the sample interval,and prove that the virtual samples generated by this method is reasonable.

Keywords/Search Tags:

unbalanced data, data classification, virtual sample generation, kernel density estimation, Copula function

PDF Full Text Request

Related items

1	The SVM Algorithm And Its Application Based Data Preprocessing In The Kernel Space For Unbalanced Data
2	Research On Adaboost Improved Algorithm For Unbalanced Data
3	The Research On Estimation Of Distribution Algorithm Based On Copula Theory
4	Image Classification Research Based On Improved PSO Algorithm
5	Research On Virtual Sample Generation Technology Based On Information Diffusion
6	Research And Application Of Unbalanced Data Classification Algorithm Based On Resampling
7	Unbalanced Data Classification In Credit Risk Assessment
8	Research On Anomaly Detection And Classification Of Labeled Data Based On Data Density
9	Multiclass Classification Method Research With SVM Arithmetic
10	Research On Employee Turnover Prediction Based On SMOTE-SVM Under Unbalanced Data