Font Size: a A A

Research On The Stability Of Feature Selection For High-dimensional Small Sample Data

Posted on:2022-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:C JiangFull Text:PDF
GTID:2518306521955719Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous production of Internet data,data storage and calculation have become complicated.In order to reduce the cost of data storage and calculation,the original data was preprocessed to filter out some important parts,namely feature selection.In gene expression and text analysis,data characteristics of higher dimensions and smaller samples are often presented.When such data is used for feature selection,it is easy to make the selected features change with the change of the sample,and the feature subsets are unstable.The change of the feature subset will cause the decision makers to have doubts in the analysis,and then it is difficult to select the more critical features,which further affects the subsequent data analysis and the optimization of the results.Existing feature selection algorithms often only consider factors such as feature dimension and sample classification accuracy,but ignore the stability of feature subsets.If this algorithm is used in high-dimensional small samples,the resulting feature subsets will be obtained.The instability is more obvious.In order to solve this problem,this thesis studies the data and feature selection algorithm,mainly from the relationship between features and the objective function of the algorithm.In addition,a new ensemble algorithm is proposed for the traditional ensemble algorithm without considering the stability problem from the algorithm itself.The result of feature selection has a certain improvement in stability,and at the same time,better results have been achieved in terms of sample accuracy.The main tasks of this research are as follows:First of all,there are many similar features in high-dimensional small samples,and similar features have little difference in the classification of samples.Selecting a certain feature can achieve the goal,and then change the result.In response to this problem,a feature selection algorithm based on feature grouping and PSO is proposed,and mutual information is used to determine the correlation between features.According to the strength of the correlation,the features are divided into different groups to make the different groups redundant.The degree is smaller,and the redundancy in the group is larger.Reduce the instability of feature subsets due to similar features,and initially reduce the dimensionality.When the PSO algorithm selects features,it is easy to fall into a local optimum during the evolutionary iteration process.Aiming at this shortcoming,an adaptive mutation strategy of particles is proposed,which dynamically adjusts the state of some particles.Secondly,the study found that when designing feature selection,often only the objectives such as feature dimension and classification accuracy are considered,and stability is ignored.To solve this problem,when using the PSO algorithm for feature selection,multi-objective optimization is carried out,and stability is considered.When designing the membership function of PSO,stability measurement criteria are added.When the algorithm is running,there is stability as a reference,and the selected feature subset improves the stability of the subset while ensuring a certain classification accuracy.Finally,the integrated algorithm can improve the stability effect,but the existing integrated algorithm is not optimized from the algorithm itself.On the basis of multi-objective optimization,this thesis optimizes the stability of the algorithm itself,and integrates,selects more stable features,and initially analyzes that the changes in stability will change with the changes in the dimension of the feature subset.
Keywords/Search Tags:high-dimensional small sample, feature selection, stability, multi-objective, integrated
PDF Full Text Request
Related items