| Cancer is one of the diseases that threaten human life.In recent years,the number of people who lost their lives due to cancer is rising.Its mortality rate has been far more than other diseases,how to treat cancer,especially in malignant tumor field,has become the focus of human research.However,tumors generally have a variety of similar subtypes,the correct classification and diagnosis of tumors is very meaningful for making a target treatment.With the successful completion of the human genome project,and the advent of the post genomic era,gene chip technology was born.Gene chip technology can detect the expression levels of thousands of genes at the same time and the expression values of these genes make up the data of gene expression profile.However,due to various subjective or objective factors,gene expression profiling data usually contain a large amount of noise,redundancy,non correlation and outliers.Besides,it also has a nonlinear,high dimension and small sample characteristics,which makes people face a huge challenge in the data processing and analysis.Therefore,how to select the relevant information from the gene expression profile data and make precise classification are the focus of this research,the main work of this paper includes the following two points:This paper proposes a feature selection method based on RFE ReliefF and SVM,this method mainly includes the two dimension reduction process of tumor gene expression profile data set,The first dimension reduction process uses the ReliefF algorithm to solve the weight value of each feature.Then the characteristic of the weight which is less than the threshold is eliminated by setting an adaptive threshold,finally,the effect of the initial dimension reduction is achieved.But ReliefF algorithm can only eliminate the irrelevant features,it can not reduce the redundant features,so the SVM RFE is introduced in the second dimension reduction,The second dimension reduction process uses RFE SVM algorithm to rank each feature,then one or more features are eliminated,and the optimal subset is found through multiple iterations.Through the combination of these two methods,the new feature selection method is obtained which can eliminate the noise,redundant and irrelevant features of the data set so as to find out the characteristics associated with the tumor classification,improve the classification accuracy and reduce the workload.Because the classification accuracy of traditional classification method is not high and it is prone to be over fitting in dealing with high dimensional small sample data.A classification method based on improved sparse representation is proposed in this paper,the method consists of three stages,the method that was used in first and second stages are the same as the previous.New data sets from the first two stages are used as the third stage of the new input data,In the third stage,we introduce the sparse representation classifier to solve the sparse coefficients,and then reconstruct the error according to the sparse coefficients.Finally Classification will be made which based on reconstruction error.In the following contrast experiment results,we can see that the method can get better performance in some data sets. |