Font Size: a A A

Study Of The Method Of Mining The Patterns Of Driver Mutation In Pan-cancer

Posted on:2019-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:X QianFull Text:PDF
GTID:2404330545964985Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cancer is one of the common diseases that have complex pathogenic mechanisms.There is no effective means to cure cancer in the academic and medical circles,but the enthusiasm and urgency of the attack on cancer have never been reduced.With the de-velopment of high throughput genome project and the development of Pan cancer analysis technology,researchers have gradually recognized the important effect of somatic mutation on cancer formation.Functional somatic mutations within coding amino acid sequences confer the advantage of selective expression in the process of cancer pathogenesis.This selective expression advantage is likely to cause canceration of cell or tissue organs.How-ever,most existing methods for identifying cancer-related mutations focus on the single amino acid or the entire gene level.But gain-of-function mutations often cluster in specific protein regions instead of existing independently in the amino acid sequences.In order to identify the somatic driver mutation clusters that can promote the formation of cancer in the amino acid sequence,this paper proposes two kinds of driver mutation pattern mining methods based on somatic mutation clustering,which use the techniques of data-adaptive kernel density estimate model and hot spot mutation identification to detect somatic muta-tion clusters on amino acid sequences.The main work of this article includes the following two points:1)A driver mutation pattern mining method,named DMCM(Data-adaptive Mutation Clustering Method),is proposed based on adaptive kernel density estimation.This method modifies the traditional kernel density estimate model which relies on the fixed kernel band-width.Firstly,a data-adaptive kernel bandwidth is constructed to form an adaptive kernel density estimate model.Secondly,the model is used to estimate the mutation density of pan-cancer mutation data and the Gaussian distribution model is used to determine the boundary of the mutation clusters.Finally,the EM algorithm is used to optimize the boundary of the mutant clusters,and then we obtain the final somatic mutation clusters.The experimental results show that the DMCM is highly robust and the identified mutation clusters are of driver significance.2)A driver mutation pattern mining method,named HMCM(Hotspot Mutation Clus-tering Method),is proposed based on hot spot mutation clustering.This method modifies the defeat of traditional hot spot mutation identification method which only for the deficiency of single amino acid mutation.Firstly,a score system of mutation clusters is constructed by using statistical methods.Secondly,the mutation hotspot region is extended from the single amino acid position to the two sides of the amino acid sequence,and the scores of mutation clusters are constantly updated until the mutation clusters’ scores are converged with the maximum values.It is proved that HMCM has the ability to identify and distinguish the carcinogenic driver mutation clusters and tumor suppressor mutation clusters by separating the missense and nonsense mutations in pan-cancer mutation data set.The-experimental results show that the method is feasible.Thus,the proposed DMCM and HMCM methods provide new methods and ideas for the research of the pathogenesis of cancer,and are of great significance to the development of human health.
Keywords/Search Tags:Cancer, Somatic mutation, Kernel density estimate clustering, Hotspot mutation clustering, Driver pattern
PDF Full Text Request
Related items