Font Size: a A A

Research On Gastric Cancer Subtypes Classification Model Based On Fusion Data Of Multi-omics

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y M FanFull Text:PDF
GTID:2404330602473594Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Cancer is a major factor threatening human health,with complexity and multiple occurrences.With the rapid development of biomedical technology,a variety of cancer subtypes have been discovered,and different subtypes have different mechanisms of occurrence.Therefore,accurate classification of cancer subtypes is very important for its initial diagnosis and targeted treatment.With the development of high-throughput technology,a large number of gene sequence information,transcription data and protein data have emerged.It is an important research direction to make full use of these data to identify cancer subtypes closely related to clinical practice.Among common cancers,the incidence rate of gastric cancer is extremely high.The clinical methods of subtype diagnosis through morphology and images have certain defects and low accuracy.For the micro RNA(mi RNA)data and DNA methylation data of gastric cancer samples in the TCGA(The Cancer Genome Atlas)database,a model for balanced processing of multi-class data sets and a for gastric cancer subtypes classification model were proposed,which effectively solved Unbalanced distribution of subtypes,and achieved a more accurate classification of gastric cancer subtypes.It mainly includes three aspects:(1)For two small sample data of micro RNA(mi RNA)and DNA methylation,which are closely related to gastric cancer,this article proposes a method of using an automatic encoder to fuse the two omics data to make full use of the controllability between the multi-omics data.Due to the data has the problems of small sample size,high feature dimension,and multiple redundancy,the equalized Lasso(KLasso)algorithm is used for feature selection.Then the attention mechanism is used to add weight to the features.The experimental results verify the effectiveness of the proposed method and improve the classification accuracy.(2)For the imbalanced distribution of subtypes of gastric cancer samples,a hybrid model based on balanced feedback sampling and Tomek link method is proposed,which balances the sample size of the four subtypes and makes the classification results better.(3)Two Boosting Deep Forest(TBDForest)classification model is constructed to improve the classification effect on small sample data.Two parts of the optimization were based on the deep forest model: one is to divide each cascade layer into two sub-layers to increase the model learning opportunities and improve the classification accuracy;the second is to comprehensively consider the performance of the integrated random tree in the model,and add it to each cascade layer in the form of standard deviation to improve the impact of the sub-classifier on the classification result and reduce the risk of over-fitting.Finally,compared with five classification models that are widely used in medical research including: Support Vector Machine(SVM),Random Forest(RF),e Xtreme Gradient Boosting tree(XGBoost),Convolutional Neural Networks(CNN)and multi-Grained Cascade Forest(gc Forest).Several evaluation indexes of classification performance are obtained.The experimental results show that the gastric cancer subtype classification model proposed in this paper has obvious advantages in the TCGA multi-omics fusion data of gastric cancer,with an accuracy rate of 97.87%,and the model obtained 95.28% accuracy rate in the gastric cancer patient database provided by the School of medical.The indicators of the model are superior to other methods,and have good generalization ability.
Keywords/Search Tags:miRNA, DNA methylation, multi-omics fusion, unbalanced, gastric cancer subtype classification model
PDF Full Text Request
Related items