Font Size: a A A

Research On The Model And Algorithm For Identifying Pan-cancer Common Drive Pathway

Posted on:2021-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:K PanFull Text:PDF
GTID:2404330629953120Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology,it has become a reality to understand the pathogenesis of carcinogenesis from molecular level.Researches have indicated that multiple driver genes may target the same dysregulated cellular signaling or regulatory pathway which leads to cancer,i.e.,mutations in any of the driver genes in the pathway can dysregulate the pathway and cause cancer.It is more biologically meaningful for obtaining heterogeneous patterns and understanding cancer formation to study mutations at the pathway level.The problem of identifying driver pathway occurs.It has become one of the important subproblems to identify the common driver pathway among different cancers.In this thesis,the problem is studied,and the main work is as follows:The problem of identifying the common driver pathway of pan-cancer is studied.Zhang et al.proposed the ComMDP method to solve this problem in 2017.This method constructs a model by calculating the absolute weighted cumulative value of each cancer type,in order to find the pathway with the largest cumulative value.However,due to the number of samples of different cancer types is generally quite different,the roles of cancers with small sample size may be neglected when accumulating the absolute weight,affecting the identification effect and missing some driver pathways.To solve this problem,in this thesis,an idea based on accumulating relative weights is proposed.Variance or harmonic mean is adopted to minimize the dispersion of each relative weight.Based on this idea,the pan-cancer common drive pathway recognition models MDP1(Multi-cancer driver pathway 1)and MDP2(Multi-cancer driver pathway 2)are presented.Due to the NP hardness of the MDP1 and MDP2 models,two intelligent optimization algorithm based methods are proposed.By introducing a short chromosome code and a greedy strategy based recombination operator,the pathenogenetic algorithms PGA-MDP1 and PGA-MDP2 for solving the models are put forward.By introducing a binary particle code,representation of particle velocity and particle operations,the particle swarm optimization algorithms PSO-MDP1 and PSO-MDP2 are proposed.By using simulated data and real biological data,the performance of the identificationmethods ComMDP,PGA-MDP1,PGA-MDP2,PSO-MDP1 and PSO-MDP2 was compared to analyze the effectiveness of the identification models and algorithms proposed.Firstly,the simulation data were used to compare and analyze the recognition methods ComMDP,PGA-MDP1 and PGA-MDP2.The comparison results show that the recognition method based on the MDP2 model can obtain higher recognition accuracy than the method based on the MDP1 model and the ComMDP method.Secondly,thesimulation data is used to compare and analyze the accuracy and running time of the algorithms PGA-MDP2 and PSO-MDP2.The results show that the recognition methods based on the same model and different intelligent optimization algorithms have basically the same recognition accuracy,and the main differences are the execution efficiency.Both the PGA-MDP2 method and the PSO-MDP2 one have good scalability,i.e.,they still have good performance when solving large-scale problems,and the PGA-MDP2 method has higher execution efficiency than the PSO-MDP2 one.Finally,real biological data is used to compare and analyze the recognition performance of methods ComMDP,PGA-MDP1,PGA-MDP2,PSO-MDP1 and PSO-MDP2.Compared with the ComMDP method,the identification methods proposed in this thesis can indeed identify some biologically significant driver pathways that are missed by the ComMDP method.In summary,in this thesis,the problem of identifying pan-cancer common driver pathways is studied,andtwo effective identification models and algorithms are proposed.The experimental results indicate that based on the proposed models and algorithms,it is indeed possible to identify some biologically significant driver pathways missed by the ComMDP method.Therefore,the presented models and algorithms may become useful supplementary tools for identifying cancer pathways.
Keywords/Search Tags:Cancer, Pan-cancer data, Parthenogenetic algorithm, Particle swarm optimization, Common driver pathway, Complex disease
PDF Full Text Request
Related items