Cancer is increasingly threatening human’ survival and health.It may take place in different parts of human body with different forms,resulting in various types of cancer.Even cancer of the same type will show different subtypes due to diverse gene mutations,which makes conventional cancer diagnosis and treatment more difficult.In order to select more effective and individualized treatment regiments for cancer patients,it is essential to classify multi-subtype cancer accurately and select key disease genes.In this thesis,we first analyzed the existing bioinformatics methods aiming at cancer multi-subtype classification.Then,we proposed a neural networks without hidden layer—the Elastic Net Regularized Softmax Regression(ENRSR)model for multiple cancer subtype classification and key disease gene selection.The model established a softmax regression with elastic net penalized sparseness regularization for gene expression profiles.It can simultaneously achieve cancer subtype classification as well as key disease gene selection.The ENRSR model was tested on both simulated data and three sets of gene expression profiles(breast cancer,small round blue cell tumor and leukemia).The performance was evaluated by k-fold cross validation and BCubed F score in comparison with some conventional classification methods,such as K-means(Kmeans),Hierarchical Clustering(Hclust),Non-negative Matrix Factorization(NMF),Expectation Maximization(EM),Support Vector Machine(SVM)and Random Forest(RF).The results showed that ENRSR model can achieve more ideal results for classification,and the selected key disease genes were analyzed by GO enrichment analysis,indicating that these genes have close relationship with the relevant cancers,which is consistent with previous studies.However,ENRSR model asks for high computation load.Therefore,we further designed a fully connected Multi-layer Neural Networks(MLNN)with two hidden layers for cancer multi-subtype classification.Considering the selection of key disease genes,we adopted ReLU activation function in the MLNN basic model.In the same way,the performance of the MLNN was verified by three groups of real gene expression profiles used in the ENRSR model.The results showed that MLNN can achieve an ideal classification performance.Since the ReLU function is a linear activation function,the key disease genes can be selected simply according to the value of the weights in the first hidden layer.Genes selected in this way were also proved to have biological conclusions consistent with the existing references.At the end of this thesis is a summary and a prospect to the follow-up work. |