| As one of the most deadly diseases in the world,the pathogenesis of cancer is complex,which has been constantly studied by human beings.It is widely believed that cancer is caused by somatic mutations,some of which confer a growth and positive selection advantage and cause tumours,while the vast majority of somatic mutations are neutral and do not cause cancer.Therefore,distinguishing which mutations contribute to the development and progression of cancer patients is one of the main goals of current cancer treatment.Based on this goal,many driver gene identification algorithms emerge at the right moment.In recent years,the method of driver gene identification based on network has become more and more important,and has achieved good results.However,these methods all have limitations.For example,the method to identify the key nodes in the network based on network centrality ignores the attribute characteristics of genetic nodes,the topological influence between neighbors,the influence of prior knowledge and the attribute of the network itself.In addition,in addition to genomic data,other omics data need to be considered.To carry out specific research on the above issues,the main work is summarized as follows:1.A new cancer driver gene identification algorithm is proposed,which combines the semi-local centrality measure with the mutational effect between genes to better evaluate the influence of gene mutation on expression changes.First,the mutation frequency of each gene in the patient population was calculated,and the mutational effect between genes was further obtained.Secondly,the gene expression values of cancer samples and normal samples were compared to obtain the differential genes.For each mutated gene,the associated differential gene is mapped onto a molecular network.Finally,the ranking vector of each mutant gene in the network was calculated according to the objective function to obtain the candidate driver genes.Compared with the existing methods,this method achieves good accuracy in the data of four types of cancer.In addition,clinical correlation analysis and pathway analysis also showed that the driver genes identified by this method had good clinical properties.2.An improved random walk algorithm is proposed to obtain the local and global vectors of nodes in the weighted network by controlling the step size,so as to better characterize the gene nodes in the network.Firstly,expression data and mutation data were processed respectively.Pearson correlation coefficient between genes was calculated using expression group data.Mutation data was processed into matrix to calculate mutation score,and the two scores were integrated into network weighting.Then feedback centrality is used as the jump probability of restarting the random walk,and local and global network analysis is obtained by controlling the step size.Finally,the result vectors obtained from the asynchronous length were superposition and integrated into the final score of the mutant genes,and the candidate driver genes were selected.Experimental results on four types of cancer data show that the number of driver genes identified by this method is superior to other methods. |