Font Size: a A A

Research On Ensemble Learning Algorithm For Imbalanced Data With Noise And Low Quality Features

Posted on:2022-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2518306317493674Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Imbalanced data refers to the data with large differences in the number of samples from different categories.The classification of such data is widely exist in financial credit evaluation,financial fraud detection,disease detection and other practical applications.Imbalanced data is often accompanied by data missing,noisy sample,low quality features,data obsoleteness and data inconsistency.Among them,the noisy sample will increase the training time and complexity of the model,and the low quality features include redundant features and irrelevant features,which will lead to the disaster of data dimension,increase the difficulty of classification model and reduce the classification accuracy.When the data contain both noisy sample and low-quality features,the quality of the data will be greatly reduced,thus increasing the training time of the model and reducing the prediction performance of the classification model.Problems such as class imbalance,class overlap and unequal distribution of minority class samples in imbalanced data lead to the uncorrected classification of more valuable minority class samples.Therefore,for imbalanced data containing noise and low-quality features.a reasonable method is designed,which is of great significance to improve the classification performance.This thesis focuses on the imbalanced data with noisy sample and low-quality features,aiming to solve the classification problem of such data.The main research contents are shown in following three aspects:(1)There is a strong correlation between the performance of the classification model and the quality of the training data,and the existence of noise in the data will increase the complexity of the classification model,resulting in the decline of the prediction performance and the extension of the training time.This thesis proposes an ensemble learning method for imbalanced data with noisy sample,which combines the noisy sample recognition and weight method with the under-sampling method,and embedding it into the ensemble learning method.The boundary sample is avoided be identified as noisy sample.This method effectively resolves the effects of noisy sample and class imbalance on the classification model.The effectiveness of the proposed method is verified by experiments.(2)For redundant features and unrelated features affect classification model construction,class imbalance,class overlap and other problems.,an ensemble learning method for imbalanced data with low-quality features is proposed.In this method,the under-sampling method is used to remove the majority samples which have no influence on the overlapping region and boundary region,and the balanced data set is constructed.Calculate the feature weights by the method proposed in this thesis and filter features through experience or by using classifier performance.The validity of the proposed method is verified by experiments.(3)In the financial field,credit card fraud detection data has the characteristics of class imbalance,missing value,noise,low-quality characteristics and so on.Aiming at the above problems,the method proposed in this thesis is used to build a credit card fraud detection model.By analyzing the experimental results,the performance of the proposed method in practical application is verified.By studying the low-quality imbalanced data,the influence of noisy sample and lowquality features on the classification problem can be reduced,so that imbalanced data can be correctly classified more easily,and the classification performance of low-quality imbalanced data can be improved.The research in this thesis is of great theoretical and practical significance to the classification of low-quality imbalanced data.
Keywords/Search Tags:Class imbalance, Noisy sample, Low-quality feature, Under-sample, Financial fraud detection
PDF Full Text Request
Related items