Although tumor treatment has been improved with the rapid development of various medical means,the characteristics of high incidence and low survival rate of malignant tumors are still a serious threat to human survival.If the early detection and accurate diagnosis can be made,according to the results of diagnosis to develop the corresponding personalized treatment plan for patients,the survival rate of patients can be greatly improved.In continuous research,it has been found that gene expression profile data provides more information than traditional tumor classification systems based on morphology and histology,so it has become a hot research field to identify and classify tumors from the perspective of informatics.However,it is very difficult to identify and analyze key information directly because of the characteristic of "high-dimensional and small-sample" gene expression profile data that we usually obtain.In order to solve the problem caused by gene expression profile data to classification task,researchers have proposed a variety of methods for dimensionality reduction of data.However,gene expression data may include some important related structures,and some genes can be divided into different groups according to their biological pathways.Existing methods fail to take into account the exact correlation structure in the data.Therefore,from a theoretical and biological perspective,ideal gene selection methods should take this structural information into account.In view of the above problems,the work of this paper is as follows:1.Methods in feature subset selection based on the traditional filtering,did not consider the role of the relationship between characteristics of gene,correlation feature selection algorithm is proposed based on the strong expression features,the algorithm of gene expression profile data set out after the key genes,in all the characteristics of remaining genes to find and build a similar set of strong characteristics,to seek in the global target subset,so as to avoid only considering part of the features screened by feature sorting and ignoring part of the key features.Experimental results on some real data sets show that the proposed feature selection algorithm based on strongly expressed feature correlation has good performance.2.In order to solve the problem that feature selection methods generally have high noise,small sample size and do not consider the relationship between features,a feature selection algorithm based on orthogonal regression with global redundant minimum manifold regularization is proposed.The global redundant matrix is introduced into the orthogonal regression model,and the method of manifold regularization term is designed on this basis.The orthogonal regression is used as the embedded statistical model,which preserves more statistical and structural information than the traditional least square regression embedding method.By adding feature weight matrix into the model,the values in feature score vector can be used to compare the importance of features in classification tasks,similar to filtering feature selection method.The global redundancy terms are introduced into the orthogonal regression model to evaluate the redundancy information from the global perspective,which facilitates the screening of valuable information.The regularization term of manifold is designed to preserve the internal spatial structure of the target feature subset after dimensionality reduction.Finally,we use a large number of real data sets and carry out comparative experiments,and then fully prove the superiority of the global redundant minimum manifold regularization feature selection algorithm based on orthogonal regression.3.According to the global minimum redundancy based on orthogonal regression manifold regularization feature selection algorithm between soft regression results and hard target loss function cannot accurately reflect the classification ability of the problem,we make improvement on it,put forward the big boundary orthogonal regression manifold regularization feature selection algorithm,the larger boundary and orthogonal constraint considered in the proposed model,under the condition of maintaining the original advantages,retaining more statistical and structural information and the values in the feature score vector can be used to compare the importance of features in the classification task and evaluate the redundant information from the global perspective,and the internal spatial structure can be retained after dimensionality reduction.We also use a large number of real data sets to carry out comparison experiments,and the global redundant minimum manifold regularization feature selection algorithm based on orthogonal regression is also taken as the comparison object,and satisfactory results were achieved. |