Font Size: a A A

Research On Imbalanced Data Classification Algorithm Based On Feature Selection

Posted on:2022-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:B J ZhaoFull Text:PDF
GTID:2518306509965329Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Imbalanced data widely exists in medical,economic and other fields.With the development and progress of society,especially in the era of big data,more and more data are high-dimensional and imbalanced,which is a huge challenge for machine learning and data mining.Classification is a research hotspot in the field of computer.When dealing with these high-dimensional imbalanced data,the classification accuracy of traditional classification algorithms such as decision tree,random forest and support vector machine can not meet the needs of people.In this paper,imbalanced data sets as the research object,the use of different feature selection algorithms for analysis,and on this basis for classification,aimed at taking into account the accuracy of a small number of categories and the overall,to enhance their classification ability.The main contents of this paper are as follows(1)To solve the problem that filtering feature selection algorithm does not consider the synergy between features,and ignoring the synergy will lead to the degradation of classification performance,a feature selection algorithm based on feature synergy(FSBS)is proposed.The algorithm first uses mutual gain to evaluate the synergy between features,then uses AUC value to evaluate each feature,and then uses these two values to select effective features for classification.Experimental results show that the algorithm can select effective features and keep the best classification performance with fewer features.Under different classifiers,it can improve the classification performance,especially the accuracy of a few classes.(2)In this paper,a feature selection method based on multi population genetic algorithm(MPCS)is proposed,which combines multi population genetic algorithm with cost sensitive algorithm to deal with the problem of imbalanced dataset classification.Firstly,multi population genetic algorithm is used for feature selection,and then cost sensitive algorithm is used for classification.Experimental results show that MPCs algorithm can reduce the occurrence of local optimal solution,find a more suitable feature combination,and improve the classification performance on imbalanced data sets.Sometimes it can reach the highest value of evaluation index theory in low dimensional data,and the effect of this method is better than other methods in high dimensional data.(3)Based on the imbalance of data classification between app designer and other related research fields,this paper realizes the data imbalance between app designer and other related research fields.The system includes36 commonly used data sets,or you can choose your own data sets.The preset data sets have different number of features,including low dimensional data and high dimensional data.There are seven commonly used classification algorithms and seven feature selection algorithms in the system.
Keywords/Search Tags:Imbalanced data, Feature selection, Classification, Feature synergy, Genetic algorithm
PDF Full Text Request
Related items