| Data mining technology is the main research area in bioinformatics,especially in cancer which is a common disease caused by gene mutation.Gene therapy is a new therapeutic tool for the treatment of diverse types of diseases,including cancer,congenital,infectious diseases.With the development of Next-generation DNA Sequence which can enable genome-wide measurements of somatic mutations in large numbers of cancer patients,the information of The Cancer Genome Atlas(TCGA)is enriched soon.Large-scale cancer genomics projects are providing a large volume of data about genomic and gene expression aberrations in multiple cancer types.How to get the useful information in TCGA has become a hot topic.A key challenge in the interpretation of these data is to distinguish driver mutation which is important in the development of cancer from random passenger mutations.Vandin et al.proposed two combinatorial properties,coverage and exclusivity,that distinguish driver pathways,or driver genes from group genes of passenger mutations.The traditional way,from the perspective of probability model,is to find the gene with higher occurring probability.However,this kind of approach is theoretical because of the heterogeneity of gene mutations.Moreover,the most famous methods for identifying driver pathways in cancer include De novo Driver Exclusivity(Dendrix)and Genetic Algorithm(GA).Dendrix and GA are de novo methods for identifying driver pathways in cancer.The two approaches are flexible methods,the first method adopt MCMC algorithm and the second one is a stochastic method that can be employed to incorporate other types of information to improve the first approach.These have the premature phenomenon and slow convergence deficiencies.In order to overcome the problems,this paper focus on the problem of identifying driver pathways in cancer,some robust de novo methods are proposed.The main researches are summarized as follows:(1)Chaos and Multi-population Genetic Algorithm(CMGA).It is well known that the somatic mutation is an important factor that has influence on development of cancer.In this study,we find an efficient way CMGA to solve the maximum weight submatrix problem which is designed to find important mutated driver genes in cancer.In this way,the population diversity can be increased and the premature phenomenon might be avoided.Among the subpopulations,the information of subpopulations can be exchanged.By importing the chaos operator,it has overcome the defect of precocity of GA,for its particularly inherent randomness and ergodicity to skip the local optimization.In addition,CMGA are applied onto somatic mutation data and 4 clinical data.(2)For CMGA algorithm,it only takes the single pathway into account,but driver mutation generally target cellular signaling and regulatory pathways consisting of multiple genes.Because different combinations of mutations in driver pathway are observed in different samples,the heterogeneity complicates the work to identify driver pathway by their recurrence across samples.Here we introduce Co-occurring Chaos and Multi-population Genetic Algorithm(CCMGA)for the simultaneous identification of multiple driver pathways.The algorithm was applied to simulated data and several real biological data.The discovered co-occurring driver pathways are involved in several key signaling processes. |