Font Size: a A A

Research On Sample-specific Cancer-Associated Gene Mining And Subtype Identification Based On Information Entropy Theory

Posted on:2021-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:C H KouFull Text:PDF
GTID:2480306524469584Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cancer has always been one of the most important factors that threaten the survival of human beings.About one-sixth of the deaths are caused by cancer every year.Cancer is a heterogeneous disease with extremely complicated pathogenic mechanisms,and different patients often exhibit different pathogenic mechanisms.Therefore,it is difficult for traditional diagnosis and treatment methods to give precise treatment plans for each cancer patient.The development of epigenetics and whole genome sequencing technology provides the possibility to further explain the occurrence and development of cancer and to conduct research on individual patients.Therefore,it is urgent to carry out research on personalized medicine based on epigenetics.In this paper,based on the theory of information entropy,the research on cancer-associated pathogenic gene mining and cancer subtype identification is carried out from the perspective of a single cancer patient,as follows:(1)Based on DNA methylation data,an information gain model is proposed to mine each patient's sample-specific cancer-associated genes.(mining of Cancer SampleSpecific associated genes using Information Gain,CSSIG)On the basis of DNA methylation,we developed an information gain model that can obtain comprehensive DNA methylation feature at the gene level and can measure the specific information carried by cancer patients to identify cancer-associated genes.This model can obtain the sample-specifici score of all genes of each cancer patient,and then screen out the sample-specific genes of each patient through a significance test.Next,simulation experiments are designed to verify the validity of the information gain model and determine the feature representation method.In real biological data experiments,we selected 31 most significant sample-specific genes and made relevant experimental verifications.Through the analysis of sample specificity,gene enrichment analysis and function analysis,we showed the biological significance of these genes in the samples,which is helpful to understand the different mechanism of each cancer patient's disease.(2)Fusion of multi-omics data,a cancer subtype identification strategy based on sample-specificity is proposed.In this paper,we propose a strategy to use information gain model to obtain the sample-specific score of multi-omics instead of the original multi-omics data for cancer subtype identification.Applying this strategy to five classical cancer subtype identification methods,we found that the multi-omics sample-specific score can optimize the clustering process and enhance the accuracy of cancer subtype identification.The results of subtype identification on real data show that the sample-specific score of multiomics is superior to the original multi-omics data in various evaluation criteria,and have more obvious regional boundary in survival analysis curve.
Keywords/Search Tags:sample specificity, information gain, cancer associated genes, cancer subtypes, multi-omics data
PDF Full Text Request
Related items