Feature Selection Via Minimizing Global Redundancy For Imbalanced Data

Posted on:2023-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:S H Huang

Full Text:PDF

GTID:2558307073991349

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In machine learning,the class imbalance issue limits the performance classification.The class imbalance problem refers to that when the sample size of minority class is far more less from that of majority class,traditional learning models are biased to the majority class,leading to the poor classification performance for the minority class.In recent years,the class imbalance problem has attracted much researchers’attention when dealing with real data sets from the fields of medical diagnosis,intrusion detection and credit rating.simultaneously,since the rise of network technology and communication technology,modern society has entered an era of information explosion.This also brings a problem:the original data contains a lot of invalid and redundant information.Beacause the quality of data directly affects the classification effect of learning model,how to extract effective information from original data has become a key problem.Therefore,feature selection has become a crucial step in the field of data mining.In practice,the characteristics of class imbalance and high feature dimension often occur at the same time.Past studies have confirmed that by selecting features more relevant to the minority class,the feature selection algorithms effectively improve the generalization ability of the follow-up learning model on imbalanced data and reduces the time complexity However,the existence of redundant features is still one of the limitations of feature selection algorithm.The main work of this thesis is to study the feature selection algorithm for imbalanced data by reducing feature redundancy.Aiming at the class imbalance issue,dimension curse and the limitations of LDA based feature selection algorithm,the main research contents and innovations of this thesis are as follows:To address the class imbalance problem and reduce the feature redundancy,a feature selection algorithm GRM-DFS based on minimizing the global redundancy is proposed.However,the class imbalance issue is not considered by most feature selection algorithms.A regularization of LDA,IR-LDA which emphasizing minority class is proposed to improve the classification performance of minority class.The regularization IR-LDA is combined with global redundancy minimization algorithm,which not only considers the class imbalance problem,but also reduces the redundancy of feature subsets.Comparison experiments show that the proposed regularization IR-LDA significantly improves the performance of classfication and reduces the redundancy of feature subsets.According to the experimental results,the performance of the proposed GRM-DFS algorithm is obviously superior to other comparative algorithms.To deal with the problems that the LDA-based feature selection algorithms exist for high-dimensional and imbalanced data,an improved LDA-based feature selection algorithm is proposed.The non-diagonal elements of the within-class scatter matrix of LDA are computed by covariance,which mean to reflects the relationship between features.Due to the limitations of covariance,the squared pearson correlation coefficient replaced covariance to calculate the correlation between features.Then,combined with the improved LDA and L₂ sparse paradigm,a discriminant feature selection method considering the class imbalance problem is proposed,and the effectiveness of the algorithm is verified.Previous studies have shown that reducing the overlap degree of data can effectively improve the performance of classfication algorithm on high-dimensional,imbalanced data.By increasing the weight of minority class when calculating the overlap degree and combineing the global redundancy minimization algorithm with adaptive parameters,the evaluation metric of overlap degree is improved.To minimize the redundancy of feature subsets while taking advantage of overlap degree,a feature selection method MODFS based on improved overlap degree and global redundancy minimization is proposed.Experiments on imbalanced data show that the proposed algorithm can effectively improve the classification performance of the subsequent learning model.

Keywords/Search Tags:

Imbalanced data, Feature selection, Minimizing global redundancy, Linear discriminant analysis, Sample overlapping degree

PDF Full Text Request

Related items

1	Research On Independent Component Analysis And Manifold Learning
2	Research And Improvement Of Feature Selection Algorithms Based On Sparse Learning
3	Feature Extraction Methods Based On Improved PCA And LDA
4	The Study Of Novel Dimensionality Reduction Methods And Application In Intelligent Recognition
5	The Fault Feature Extraction Method For Sample Imbalance
6	Research On Methods Of The Discriminant Feature Extraction In Face Recognition
7	Research On Automatic Recognition Of Red Tide Algae Image Captured By Flow Cytometry Based On Linear Discriminant Analysis
8	Linear Discriminant Analysis And Promotion Of Research
9	Research On Feature Selection Algorithm Based On Adaptive Nearest Neighbor Graph Model And Local Discriminant Analysis
10	Towards Face Recognition From One Single Training Sample Per Person Using Algebraic Features