A Non-sampling AdaBoost With Information Entropy For Imbalanced Learning

Posted on:2022-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Xie

Full Text:PDF

GTID:2518306509979559

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

In the information globalized society,big data is an important carrier containing various information.It is a hot topic for researchers to deeply mine the valuable information in big data.Under the background of big data,imbalanced data can be seen everywhere,touched in many fields such as medical diagnosis,financial risk prevention.However,because of the imbalanced characteristics of data,traditional classification methods can't achieve good results,so a variety of learning methods for imbalanced data have emerged.At present,sampling is a popular method to learn imbalanced data to improve classification performance.However,adding generated samples often changes the distribution information of the raw data and leads to the decision-making boundary unreliable more or less.In order to avoid changing the distribution of dataset,this paper proposes a non-sampling Ada Boost with information entropy for imbalanced learning(NSAIE).Firstly,the proposed method doesn't conduct sampling technology and keeps the distribution of raw imbalanced datasets.Starting from the aspect of feature selection,information entropy and mutual information are introduced to measure the importance of features.The features correspond to high information entropy value are selected.The selected features have property of weak correlation with other features and strong correlation with class label,which is called entropy-feature.To be specific,we design heuristic entropy-feature which effectively maps the data from a high-dimensional feature space to a low-dimensional space inspired by the mutual information theory.Secondly,the algorithm is improved under the framework of Ada Boost.In order to ensure that important minority samples are correctly classified,we set the adaptive weight for base classifier in the framework.The ensemble learning model approaches the imbalanced boundary step-by-step,and stops the learning process when the metrics are high enough or reaching the maximum number of iterations.In this way,the minimum loss of minority samples can be realized and the maximum number of majority samples can be removed.Finally,we conduct comparative experiments on 12 datasets,where visualization of NSAIE algorithm applied in on some datasets is beneficial to better understand the proposed classification idea.The experimental results show that the proposed algorithm is superior to other 17 common methods and achieves 88.72%,79.41%,80.39%,and 79.41% respectively in metrics of-,-,,and .The scores reflect the superiority of the proposed method.In addition,when NSAIE algorithm is applied to multi-classification dataset,all metrics achieve top 1%,indicating that this method has certain practical significance in solving the multi-classification problems.In order to verify the differences between NSAIE algorithm and other methods,a-test with a significance level of 0.05 is carried out.The experimental results show that NSAIE algorithm has significant advantages compared with other imbalanced learning methods.

Keywords/Search Tags:

Imbalanced Learning, Information Entropy, Feature Selection, Ensemble learning

PDF Full Text Request

Related items

1	A Study On Feature Selection Algorithms Using Information Entropy
2	Research On Feature Selection For Imbalanced Label Density Learning
3	Research On Methods For Imbalanced Data Classification
4	Research On Feature Selection Method Based On Information Diversity Analysis
5	Two-class Imbalanced Big Data Classification Based On Data Reduction And Ensemble Learning
6	The Research Of Ensemble Learning And Its Application Based On Feature Selection
7	Research And Application Of Imbalanced Data Classification Algorithm Based On Ensemble Learning
8	Hybrid Ensemble Learning For Imbalanced Data
9	Research On Cryptosystem Identification Scheme Based On Machine Learning
10	Research On Imbalanced Data Classification Methods Based On Ensemble Learning