| Currently,cancer has become the second leading cause of death in the world,and timely cancer detection and accurate cancer diagnosis are crucial for formulating treatment plans,increasing cancer cure rates,and improving patients’ quality of life.At the same time,the completion of the Human Genome Project has prompted researchers to use gene data to explore the intrinsic relationship between human physiology and pathology,and to establish effective methods to combat diseases.Most diseases,including cancer,are closely related to the development and mutation of human genes,so the study of cancer classification based on gene data is of great significance in clinical medicine.However,the massive,complex,and exponential growth characteristics of gene expression data also pose challenges to improving dimensionality reduction methods and improving classification accuracy.In this paper,the two gene expression data of breast cancer and DLBCL-B are used as the research objects,and the variable selection and classification methods are respectively explored.The specific work is as follows:In the variable selection problem,this paper proposes a maximum relevance minimum conditional redundancy variable selection method for the characteristics of gene expression data,such as complex structure,huge number,high-dimensional and small samples.This method considers the conditional mutual information between the candidate variable and the selected variable under the condition that the response variable exists,and regards it as redundancy.On three classifiers,random forest,support vector machine and BP neural network,it is compared with the variable selection method based on random forest feature importance,max-relevance min-redundancy to verify the effectiveness of the proposed variable selection method.In the classification problem,this paper proposes a deep neural network model based on the gray wolf optimization algorithm.The method uses the training error of the deep neural network as the fitness function of the gray wolf optimization algorithm,and determines the number of hidden layers and hidden nodes of the deep neural network through the continuous iterative optimization of the optimal fitness function by the wolves.It breaks through the limitation that structural parameters need multiple experiments in the traditional modeling process.In addition,this paper also combines the gray wolf optimization algorithm and the convolutional neural network to classify gene data.This method uses the gray wolf optimization algorithm to determine the learning rate of the convolutional neural network model,the effectiveness of the proposed method is verified by comparing the classification accuracy without parameter optimization and other machine learning methods.In summary,this paper takes gene expression data as the research object,and proposes a new variable selection method and deep neural network parameters determination method,which can provide an effective solution for disease diagnosis based on gene data and reliable decision support for decision makers.It has certain reference and practical value. |