Font Size: a A A

Deep Neural Network Model Based On Methylation Difference For Tumor Classification And Early Diagnosis

Posted on:2019-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q Z ZhangFull Text:PDF
GTID:2404330545491933Subject:Cell biology
Abstract/Summary:PDF Full Text Request
Background and purpose of research:DNA methylation is the most common and important epigenetics research recently,which is closely related to the occurrence and development of tumors and has obvious characteristics in the early stage of tumor formation.In addition to the dramatic changes in the methylation level of the tumor genome compared with the normal genome,there are also significant differences among different tumor types.Therefore,the level of methylation changes can be used as a specific molecular marker to distinguish between tumor types.At present,some specific protein molecular markers,CT images and other pathological indicators are applied to distinguish between tumor types.However,these methods do not have very high accuracy and are difficult to be detected by pathological indicators during the initial stage of tumor formation.At present,the identification of early tumors is mostly focused on the detection of circulating tumor DNA(ct DNA)in the plasma.However,the technology is not yet mature,the cost is relatively high,and the published sequencing data are few.In genomics analysis,although the sequencing technology is maturing and many mechanisms of tumorigenesis are explained in the field of molecular biology,for the tumor genome,the current researches mostly focus on the mutations of some oncogenes and tumor suppressor genes.There is no very effective bioinformatics method for the differentiation of different types of tumors and the prediction of early tumors.In consequence,looking for a solution to this problem is particularly important.Research methods and results:The data used in this project are the 450 K data of 24 tumor types in the TCGA database,and the WGBS data in the GEO database and the Roadmap Epigenomics database,and simulation data.For the Illumina 450 K data,the calculated beta value was used as the methylation level;for the WGBS data,alignments were made to the reference genome using sequence alignment software,by correcting for base mutations at the Cp G site and using the Gaussian function method to further process the data,eliminating the base mutation and the influence of adjacent Cp G sites,in order to get a more accurate methylation status.After calculating the correlation coefficient of the methylation status at the same sites of the same tissue type in the two types of data,it can be concluded that the two data have very high correlation and can be used together.Data were preliminarily processed using statistical methods and genomic functional regions for initially screening methylation-differentiated sites in tumor tissues,which utilizes mean absolute difference(MAD),standard deviation(SD),standard fraction(Zscore),and various genomic functional areas.The final removal of the sites showing similar methylation status,compared with the corresponding normal tissue no significant changes in methylation status and not falling in the clear functional areas,a total of 1894 sites with significant difference in methylation status were obtained.The final Cp G loci were used as eigenvalues to construct a training set of the deep neural network model.For the differentiation of tumor types and the prediction the early stage of tumors,a deep neural network model was constructed respectively.Both models have the same structure and contain one input layer,five hidden layers and one output layer.The Sigmoid function is used as an activation function and the Kronecker symbol is used to construct the marker matrix.The difference is that the identification model of the early stage of the tumor has different input characteristics and learning rate.The input eigenvalues will remove the specific sites of both leukocytes and leukemias,because the simulated data is generated by mixing various types of tumor tissue data with normal leukocyte data in a certain ratio.To simulate data as realistic as possible,the proportion of tumor tissue in the mixture is smaller,the effect of the specific sites of between two tissues needs to be removed.The two models have been trained,corrected and compared with the existing models including KNN,Naive Bayesian,logistic regression,SVM and Random forest,all of which have ideal accuracy.Research conclusion : By using a large amount of data in the database and bioinformatics data analysis technology,this project finds that the methylation of the tumor genome has dramatic changes.By correcting the Cp G site base mutation and using the sliding window method to process the WGBS data,the base mutation and the effect of adjacent Cp G sites are eliminated.We extracted the methylation-specific sites of various tumors and constructed a deep neural network(DNN)model by a combination of traditional statistics and deep neural network.After the training,calibrating and verifying a large amount of realistic data and simulated data,and comparing the performance with existing models,two deep neural network models named TTR_DNN and ETP_DNN with higher accuracy are obtained,which can help to distinguish the tumor types and prediction the early tumors.
Keywords/Search Tags:Tumor, Methylation, Deep Neural Networks, Early Diagnosis
PDF Full Text Request
Related items