| Disease affects the normal work and life of human beings. Complex diseases, with cancer as a representative, threaten human lives seriously. Research on disease biomarkers will help uncover the underlying disease pathogenesis and guide personalized treatments. As a result, research on disease biomarkers attracts more and more attention, becoming an important branch of bioinformatics.In recent years, with the development of high-throughput sequencing technology, different kinds of biological data have emerged in large numbers, and research on disease biomarkers has entered a new phase. Known disease genes, which have been clinically demonstrated to be associated with disease, can be used as a priori knowledge to guide research. However, most of current research methods only use known disease genes as the validation data, but not as a priori knowledge to guide the identification of disease biomarkers. In this paper, we introduce the known disease genes as a priori knowledge. In addition, the accumulation of biological data also contributes to the study of the calculation methods. Researches tend to calculate the distance between genes and known disease genes using a network model to predict disease biomarkers based on the biological hypothesis “proteins associate with the same disease have a lot of interaction between each other". These methods include shortest path, random walk, diffusion kernel and so on. Among these methods, diffusion kernel outstands with its ability of considering the global topological structure of the network. In this paper, we use the diffusion kernel to measure the distance between genes while constructing disease networks.In this paper, we propose a network-based approach to identify cancer biomarkers. Detailed research process is as follows. First, a collection of published cancer genes for a specific cancer is made from three public databases. For each cancer, we obtain an initial gene set in which each gene is published in at least one of the above three databases. Then, six cancer molecular networks are constructed based on the six initial gene sets, with diffusion kernel employed as measurement of the similarity between genes. Finally, we cluster the constructed networks using MCL algorithm, score candidate genes and predict marker genes. Additionally, a comparison of topologicalproperties between published disease genes and non-disease genes is made. An analysis of the topological properties of the constructed networks is also made.Experimental results show that our method is effective and can reliably identify cancer biomarkers. From the comparison with random walk with restart, our method performs better in telling cancer samples from normal samples, and the potential disease genes we predicted show a more significant enrichment in published disease gene database. Importantly, our method will enhance the biological interpretation of protein interactions and provide insights into cancer mechanisms. |