Font Size: a A A

The Analysis Of Data Mining Based On Inherited Disease Mutations

Posted on:2018-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:C C WangFull Text:PDF
GTID:2334330515983864Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of technology and cost reduction,genome sequencing has been used in Mendelian genetic diseases,complex diseases,and cancer gene detection,and yielded a large number of sequencing data.These data for the study of pathogenesis of various diseases,disease clinical diagnosis,as well as the development of personalized disease therapeutics are of great significance.The molecular pathogenesis of more than 4000 human genetic diseases is unclear.Studies have shown that the mechanism of inherited diseases is closely related to alternative splicing.Splice site is one of the important regulatory elements of the alternative splicing mechanism,studying the pathogenic mechanism of inherited disease at the level of splice site plays an important role in the study of the pathogenesis of genetic diseases.In this paper,sequential pattern mining models are used to study the pathogenic mutations within splice site regions.Cancer is the greatest threat to human health,Identification of potential oncogenes and tumor suppressor genes not only advanced our understating on the genetic basis of tumorigenesis and cancer progression,but also significantly enabled the development of personalized cancer therapeutics.Genomic sequencing studies in the past several years have yield a large number of cancer somatic mutations,but interpretation of cancer sequence information still remains a major challenge.In the past researches,the most commonly used approach to distinguish a small number of driver mutations from those background passenger mutations is to identify significantly mutated genes in a cohort study.To complement this approach,various computational tools have been developed to assess the effects of missense mutations on protein functions.Although computational tools have been developed to predict the functional impact of mutations,their utility is limited.We hypothesize that those shared mutations are more likely to be cancer drivers because they have the established molecular mechanisms to impact protein functions.In this paper,we used the overlap mutations between somatic mutations in COSMIC and pathogenic genetic variants in HGMD to identify potentially novel cancer drivers.In this paper,are as follows:(1)Sequential pattern-based splicing effects prediction for germline variants within splice site regions.In this paper,we integrated sequential pattern mining and position weight matrix,the result shows that the model has a good classification effect in distinguishing the pathogenic genetic variants in genetic diseases and common variants,pathogenic mutation in the splice site of genetic disease weaken the splice site signal,resulting in the destruction of the normal splice and leading to disease.(2)Identification of cancer oncogenes and tumor suppressor genes based on genetic disease causing mutations.In this paper,we identify potentially novel oncogenes and tumor suppressor genes as those somatic mutations that overlap with known pathogenic mutations in Mendelian diseases.We hypothesize that those shared mutations are more likely to be cancer drivers because they have the established molecular mechanisms to impact protein functions.We first show that the overlap between somatic mutations in cancer somatic mutations and pathogenic genetic variants is associated with high mutation frequency in cancers and is enriched forknown cancer genes.We then identify putative tumor suppressors based on the number of distinct overlapping mutations in a given gene,and our results suggest that ion channels,collagens and Marfan syndrome associated-genes may represent new classes of tumor suppressors.To elucidate potentially novel oncogenes,we identified those overlapping mutations that are not only highly recurrent but also mutually exclusive from previously characterized oncogenic mutations in each specific cancer type,and our study represents can use HGMD/COSMIC overlapping mutations to discover new cancer genes from the vast amount of cancer genome sequencing data.
Keywords/Search Tags:splice site, sequential pattern mining, genetic diseases, cancer driver mutation
PDF Full Text Request
Related items