Font Size: a A A

The Research On The Classification Of Cancer Subtypes Based On Deep Flexible Neural Forest

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2404330605960607Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The same type of cancer can be subdivided into many different subtypes,and different cancer subtypes have distinct prognostic responses and treatment outcomes to treatment options.The discovery and determination of cancer subtypes is of vital importance in the treatment of cancer,which is the key basis for providing personalized and accurate treatment for cancer patients.Using genome sequencing technology to obtain cancer genome data,researchers can classify cancer subtypes at the molecular level.However,due to the characteristics of gene expression data such as high dimension,small sample,high noise and high redundancy,the classification accuracy of traditional machine learning methods is easily interfered when using gene expression data to predict the classification of cancer subtypes.To avoid these interferences,on the one hand,noise and redundant information in gene expression data should be reduced;on the other hand,classification models suitable for such high-dimensional and small sample size data should be carefully designed.On the basis of combing and summarizing related research on cancer subtype classification,this paper aims at the current application of gene expression data to classify cancer subtypes,from the aspects of feature gene selection,classifier performance improvement and integration of multi-omics data.A feature selection method based on Fisher ratio and neighborhood rough set is proposed,a deep flexible neural forest model is established,and a hierarchical integration deep flexible neural forest framework is proposed.The experimental results show that through feature selection and the use of new classification models,the accuracy of the cancer subtype classification has been significantly improved,and the feature gene subset that has an important effect on the cancer subtype can be found,providing an important basis for subsequent precision medicine.The main research contents are as follows:(1)Proposed a gene selection method based on Fisher ratio and neighborhood rough set.Firstly,Fisher ratio was used to sort all genes,and the first k genes were selected as the primary gene subset to filter out irrelevant genes.Then the forward greedy numerical attribute reduction algorithm based on neighborhood rough set is used to realize the final selection of genes,and the redundant genes are further eliminated.Therefore,a strategy of "preselection + final selection" is adopted.The combination of Fisher ratio and neighborhood rough set algorithm can effectively remove a large number of irrelevant genes,reduce the space-time consumption of neighborhood rough set and reduce the training time of classifier.Experimental results show that the proposed algorithm is superior to the original data,Fisher ratio,neighborhood rough set and maximum relevance minimum redundancy(MRMR)algorithm in terms of the number of selected features and classification accuracy.(2)Proposed a deep flexible neural forest model for cancer subtype classification.The deep flexible neural forest model is an ensemble of flexible neural tree(FNT),which addresses two limitations that the FNT cannot directly handle the multi-classification problem and the increase in model depth leads to the high cost of parameter optimization algorithms.First of all,the deep flexible neural forest model adopts M-ary algorithm,which is an ensemble of multiple FNT at each layer to deal with multi-classification problems.Secondly,the deep flexible neural forest model adopts a cascade structure which can increase the depth of the entire model without increasing the parameters of FNT.Through the tree structure optimization algorithm,the FNT structure is selected automatically and the number of the cascade level is determined adaptively so that it can be applied to small-scale genomic data.The experimental results show that the proposed algorithm outperforms traditional classification algorithms such as K-nearest neighbor algorithm(KNN),support vector machine(SVM),multi-layer perceptron(MLP)and ensemble classifiers such as random forest(RF)and deep forest(gcForest).(3)Proposed a hierarchical integration deep flexible neural forest framework to integrate multi-omics data for cancer subtype classification.Aiming at the heterogeneity and complexity of cancer,a cancer subtype classification method to integrate multi-omics data is proposed.First,a stacked autoencoder(SAE)is used to learn the high-level representations in each omics data respectively;then,all the learned high-level representations are merged into another layer of autoencoder to learn more complex data representations;finally,the learned complex data representation are as input to the deep flexible neural forest model.Hierarchical integration deep flexible neural forest framework learns the high-level representation of each omics data through multiple autoencoders,taking the intrinsic properties of each data type into account,and the multi-omics data are integrated into a layer of autoencoder,taking the correlation of different multi-omics data into account.The experimental results show that the proposed model to integrate gene expression data,mi RNA expression data and DNA methylation data obtains higher classification accuracy than the use of gene expression data only for cancer subtype classification,and its classification performance is better than that of K-nearest neighbor(KNN),support vector machine(SVM),random forests(RF),deep forest(gcForest)classifier and the integration of multi-omics data algorithm mix Omics.
Keywords/Search Tags:Classification of cancer subtypes, Cascade forest, Gene selection, Data integration, Deep learning
PDF Full Text Request
Related items