| The development of gene microarray technology,allowing the researchers to fast and convenient to obtain large amounts of gene expression profile data,the data for disease diagnosis and analysis at the molecular level provides a new possible,however,how to use data mining technology to extract valuable information and analysis of gene expression data has become the key to effective use.Gene expression profile data usually contain tens of thousands of gene expression values.At the same time,due to the high cost of gene expression detection,the expression profile data of tumor cases are relatively few,resulting in high dimension and small sample are the main characteristics of tumor gene data,leading to the inevitable curse of dimension.In addition,due to the gene expression related to tumor disease is usually only a small part of the existing methods cannot be directly analyzing tumor genetic data effectively,but also to the diagnosis of tumor diseases constitutes a great challenge and recognition,has become the current of the main problems faced by data mining,and dimension reduction is effective for tumor data processing means,through the dimension reduction can effectively extract the gene expression related to tumor diseases associated with tumor identification or extract ingredients.Therefore,this article proposes two dimension-reduction algorithms based on the characteristics of tumor gene datasets,which can obtain more identifying feature subsets or components through dimension-reduction to improve the classification ability of tumor data,and verifies the effectiveness of the algorithm through experiments on multiple tumor gene datasets.The main work of this article is as follows:(1)Aiming at the problem that it is difficult to select the relevant feature subset effectively due to the high dimension of tumor gene data,a hybrid feature selection method combining minimum redundancy maximum correlation algorithm(mRMR)and improved krill herd algorithm(IKH)was proposed to select the highly correlated feature subset.The classification accuracy of 5 fold cross-validation and the weighting of feature number were used as the fitness function,and the step size adjustment strategy of exponential nonlinear decline and elite particle chaotic mutation were used to enhance the global search ability of the krill herd algorithm.Experiments were carried out on a number of public tumor gene datasets.The results show that the proposed algorithm can obtain better classification ability while obtaining fewer feature subsets.(2)Aiming at the problem that the tumor gene data set has a large number of irrelevant and redundant features,and is affected by small samples,the recognition rate of tumor types is relatively low,a combination of mRMR algorithm feature selection,sequence forward selection algorithm(SFS)feature ranking and partial least squares algorithm(PLS)feature extraction hybrid feature dimensionality reduction algorithm.The algorithm uses the mRMR algorithm to perform preliminary ranking of features,selects a certain number of features with high correlation and low redundancy,and further uses the SFS algorithm to improve the quality of feature ranking,and then gradually increases the top features through iteration,and uses the PLS algorithm to extract corresponding components,use the extracted components to construct a classification model and calculate the classification accuracy.Finally,the highest classification accuracy among the iteration results is selected as the final result.The experimental results of 6 public tumor gene data sets show that the algorithm has good classification ability. |