Font Size: a A A

Research On Classification Model Of Gastric Cancer Based On DNA Methylation Imbalanced Data

Posted on:2020-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2404330572499199Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The incidence of gastric cancer ranks first in all kinds of cancers in China,and there is no recognizable symptom in early gastric cancer,which is difficult to be discovered.Therefore,the screening of early gastric cancer has important clinical value for its timely treatment.At present,the classification of gastric cancer is mostly based on pathological images.This method mainly relies on the clinical experience of the attending physician and has low accuracy.Thus in order to improve diagnosis which is based mainly on morphological and imaging methods,this paper proposes a classification scheme based on DNA methylation sequencing data,which aims to achieve an accurate classification of early gastric cancer.In this paper,aiming at the imbalance and high noise of DNA methylation sequence data in the Cancer Genome Atlas(TCGA),an integrated hybrid-sampling model based on Synthetic Minority Oversampling Technology(SMOTE)and Tomek Link algorithm is proposed to effectively solve the problem of data imbalance.To solve the small and high problem of DNA methylation data samples sequencing,this paper was used ten-fold cross-validation technique to divide training set and testing set.Further the minimal Redundancy Maximal Relevance(mRMR)method was utilized in selecting the characteristics of the training data set as well the 122 most relevant subset features.Finally,considering as the end-2-end model training pattern of classification of small sample data sets is prone to over-fitting,the pre-trained model is used to extract features and train other classifiers in this paper,which involves less training parameters and reduces the risk of over-fitting of the model.In this paper,Convolutional Neural Network(CNN)is used to train the pre-trained model.Then the output features are fed into three classifiers: Support Vector Machine(SVM),Deep Forest(DF)and Random Forest(RF)to train the model,and the final classification results are obtained.The experimental results show that the proposed gastric cancer classification model based on DNA methylation imbalance data obtained 98.5% accuracy on the TCGA database,and obtained 96% accuracy rate in the self-built database provided by the School of Pharmacy.The model indicates better generalization ability.Compared with the best classification model used in the current study,the accuracy of the proposed model is enhanced by more than 5%.
Keywords/Search Tags:DNA Methylation, Data Imbalance, Pre-Trained Model, Gastric Cancer Classification Model
PDF Full Text Request
Related items