Font Size: a A A

Research On Ensemble Learning Algorithm For High-Dimensional Data Classification

Posted on:2023-07-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H XuFull Text:PDF
GTID:1528306830982719Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of digital economy,data,as a key factor of production,contains important knowledge and information.In order to extract valuable knowledge information and potential rules from massive data,automatic classification has become a research hotspot in the field of machine learning.Many practical applications of pattern recognition and machine learning,such as bioinformatics,gene microarray analysis,image recognition and text classification,are faced with the problem of high-dimensional data.Complex high-dimensional data contains a large number of noisy and redundant features,which not only increases the storage overhead of data,but also increases the complexity of constructing classification model.Meanwhile,high-dimensional data is often mixed with problems such as small sample size and class imbalance,which has a serious negative impact on the classification algorithm.It is easy to cause the curse of dimensionality,the over fitting of algorithm and the bias of algorithm to the majority class,resulting in a sharp decline in classification performance.Therefore,the research of high-dimensional data classification has great challenges,and it is urgent to propose an effective and robust classification algorithm to solve the above problems.Since it is difficult to construct an optimal classifier in high-dimensional data,ensemble learning has become an effective strategy to solve the problem of high-dimensional data classification.By constructing several diverse classifiers and integrating their prediction results,ensemble learning can obtain a more accurate and robust prediction result.This paper focuses on the classification of high-dimensional data and high-dimensional imbalanced data by ensemble learning,and puts forward several effective ensemble learning frameworks.The purpose is to improve the accuracy and diversity of ensemble members in high-dimensional data,so as to construct a more powerful and robust classifier ensemble system.The main work of this paper is summarized as follows:1)For the classification of high-dimensional data,this paper proposes an adaptive classifier ensemble learning algorithm based on spatial perception(Ada SPEL).First,a local spatial perception method is designed,which applies feature transformation to multiple random and disjoint local subspaces to alleviate the failure of the algorithm in dealing with high-dimensional data.Meanwhile,this method encourages the accuracy and diversity of ensemble members.Then,a cross-space perception method based on sample distribution is designed to generate cross-space enhanced features,which can provide a clearer macro analysis for the data.Finally,an adaptive selective ensemble approach based on local and global evaluation mechanisms is designed to improve the classification performance of the integrated system.Our method is compared with the mainstream ensemble methods on different high-dimensional data sets,and the experimental results verify the effectiveness of the proposed algorithm.2)Considering that high-dimensional data contains a large number of redundant and invalid features,random division cannot guarantee the quality of each local subspace,which may have a negative impact on the algorithm.Aiming at this limitation,this paper proposes a classifier ensemble algorithm based on subspace enhancement(CESE).First,a superior subspaces enhancement is proposed to perform effective feature selection and transformation on high-dimensional data.In different random scenarios,multiple subspace enhanced features with diversity and discrimination are generated.Then,a mixed space enhancement is proposed,which performs multiscale rotation reconstruction on the subspace enhanced features to obtain the mixed enhanced features,so as to enhance the representation ability of the features.Finally,different combination strategies for enhanced features are designed to improve the classification performance of the algorithm.Experimental results on different high-dimensional data sets demonstrate that CESE outperforms other mainstream classifier ensemble methods.3)This paper proposes an adaptive subspace optimization ensemble method(ASOEM),which aims to construct a powerful ensemble system for high-dimensional imbalanced data classification.First,an adaptive subspace generation method is proposed,which can excavate a more robust superior subspace by considering the performance of features in different scenarios,so as to alleviate the effects of redundant and invalid features in high-dimensional imbalanced data.Then,by using rotated subspace optimization,the univariate features of the superior subspace are transformed into multivariate features to enhance the representation ability and diversity of features.Finally,several extended versions of ASOEM are implemented based on different resampling strategies to verify the generality of the algorithm.Experimental results on high-dimensional imbalanced data sets demonstrate that this algorithm outperforms the mainstream imbalance learning approaches and classifier ensemble methods.4)Considering that it is difficult to construct an optimal feature subspace in high-dimensional imbalanced data,this paper proposes a classifier ensemble algorithm based on multi-view optimization(CEMVO),which aims to deal with high-dimensional imbalanced data from two aspects: feature optimization and sample optimization.First,an optimized subview generation method is proposed to generate multiple optimized subviews under different random scenarios.Then,considering that the optimized subviews are generated from different scenarios,their generalization abilities vary.Therefore,a selective ensemble of optimized subviews is proposed to integrate an optimized view with stronger generalization ability from the optimized subviews.Finally,to alleviate the impact of imbalanced data on the base classifier,an over-sampling strategy is implemented on the optimized view to construct a new class balanced subset.Experimental results on different high-dimensional imbalanced data sets demonstrate the superiority of the proposed algorithm.
Keywords/Search Tags:Classification Ensemble, High-Dimensional Data Classification, Imbalance Learning, Subspace Optimization, Resampling Strategy
PDF Full Text Request
Related items