| Cancer has become the second leading cause threatening human life and health,and cancer metastasis is the primary cause of cancer death.In clinical diagnosis and treatment,occurrence of cancer metastasis is a crucial prognostic indicator,directly impacting future treatment decisions and medical intervention for patients.Research on prediction of cancer metastasis events and cancer metastatic lesions provides accurate prognosis reference for cancer clinics,hence providing timely medical intervention,thereby prolonging the survival of cancer patients.Recent years,research mode of cancer metastasis has expanded from single cancer to pan-cancer,exploring the molecular pattern shared among cancers by analyzing the similarities between different cancers.Identifying new cancer clinical target genes by identifying the genetic characteristics within multi-omics among different cancer types as well as in-depth exploring the mechanisms and roles of existing target genes across different cancer types provides new scientific basis for cancer metastasis related prognosis and medical intervention.There are still huge challenges in current stage of cancer metastasis related research.On one hand,it is reflected in quality of data,challenges such as high noise,high batch effect,and category imbalance in current cancer multi-omics data,seriously affect the accuracy of gene feature analysis,resulting in a lack of targeted gene feature generation methods as well as feature enhancement methods for cancer data.On the other hand,it is reflected in cancer data analysis model,lacking of prior knowledge fully integrating methods,gene regulation network structure modeling designs and sample set characterized modeling designs of traditional cancer metastasis research model further increase the difficulty of related researches.At this stage,the bottlenecks for cancer metastasis-related gene identification and cancer metastasis prediction lie in the unsatisfactory performance and the scanty biological interpretation of the researches,thus more difficult to conduct research on derivative issues such as the prediction of cancer metastatic lesions.In response to challenges in traditional cancer metastasis prediction methods,research on identification and prediction methods for cancer metastasis-related genes based on pan-cancer multi-omics data was proposed.Proposed research focuses on four key scientific issues as follows: cancer metastasis related gene recognition,cancer metastasis event prediction,cancer metastatic lesion related gene recognition and cancer metastatic lesion prediction.Multiple cancer metastasis prediction models and algorithm tools are established in this research,purposing at breaking through the performance bottleneck of traditional prediction models,while taking the biological interpretability of the models into account and providing effective support for predicting cancer clinical events.Main research content of this article are as follows:Firstly,to address challenges of heterogeneous pan-cancer multi-omics data outliers affecting gene feature integration as well as the lack of negative cancer metastasis related genes set affecting model prediction accuracy in cancer metastasis-related genes identification process,a new method is proposed based on semi-supervised clustering using pan-cancer multi-omics data.By extracting multi-omics features of different cancer types and performing feature fusion and dimensionality reduction,influence of outlier noise in data distribution were effectively reduced.By making full use of prior knowledge about gene relationship and cancer metastasis-related gold standard gene set,cancer metastasis-related gene identification method based on semi-supervised K-Medoid clustering was established,high precision cancer metastasis-related gene identification was achieved.10-fold repeated experiment was carried out on 12,515 genes included in the international authoritative dataset TCGA-CDR,of which 77% identified genes were supported by internationally published academic literature.In cancer metastasis-related genes identification,precision of method proposed in this article is superior to recent similar methods represented by EMOGI,PLUS and four commonly used semi-supervised learning methods.Simultaneously,in-depth analysis of the identified genes’ functions shows that high-scoring genes identified by proposed method,such as MMP2,CD44,and VEGFC,are highly correlated with EMT pathway activation and the positive regulation of cell migration.IL2,TIMP2,and other potential cancer clinical targets were accurately identified.Experimental results effectively proved the validity and accuracy of proposed method for cancer metastasis-related gene identification,as well as its referential value in cancer clinical.Secondly,to address insufficient information of cancer metastasis prediction features as well as low predicting precision in cancer metastasis prediction task,a new cancer metastasis predicting method based on residual convolutional neural network was proposed.By constructing 2D enhanced weight of related genes,gene expression features of cancer cases were enhanced.By modeling a residual convolutional neural network,limitations of existing graph convolutional neural network’s application of 1D gene expression features were further broken through,high precision cancer metastasis prediction was ultimately achieved.A 10-fold cross-validation experiment was performed on 9,554 cancer cases with clear clinically labels within 33 cancers types included in the international authoritative data set TCGA-CDR,resulting a substantially higher precision of proposed method than recent similar methods represented by Meta Cancer and three commonly used machine learning methods.Experimental results effectively proved the validity and accuracy of proposed method for cancer metastasis prediction,as well as its referential value in cancer clinical.Thirdly,to address issues of ignoring the directionality of gene regulation relationships as well as model models’ low predicting precision using protein interaction relationships in existing metastatic lesion-related genes identification research,a new method based on graph convolutional neural network of cancer metastatic lesion-related genes identification was proposed.Based on biological implications of gene regulation relationships,directed graph convolution kernel was constructed by extracting gene’s coregulate relationship and co-regulated-by relationship,thus directed graph convolutional neural network was modeled,high-precision metastatic lesion-related genes identification was achieved.A 10-fold cross-validation experiment was performed on 1217 genes which were clearly labeled with cancer metastatic lesion-related included in international authoritative dataset Dis Ge Net,precision of method proposed in this article is superior to recent similar methods represented by MTGCN.Semi-supervised experiments were carried out on the complete set of 9875 genes,high-scoring genes such as PICK1,LYPD1,and HYAL3 identified by proposed method in this article were confirmed by biological experiments to participating in organ-specific metastasis of cancers,DRG2,YPEL3 and other gene targets inhibiting cancer organ-specific metastasis were accurately identified.Experimental results effectively proved the validity and accuracy of proposed method for cancer metastatic lesion-related genes identification,as well as its referential value in cancer clinical.Finally,to address challenges of weak model designing pertinence for significant batch effects and classes imbalance in pan-cancer samples,as well as models’ low predicting precision in cancer metastatic lesion prediction process,a new method based on multi-kernel support vector machine of cancer metastatic lesion prediction was proposed.By in-depth mining cancer gene expression features with batch effects and unbalanced class labels,sample set characterized kernel weight was constructed,then by modeling multi-kernel support vector machine,potential cancer metastatic lesion was accurately predicted.A 10-fold cross-validation experiment was performed on 620 metastatic cases of 18 cancers included in the international authoritative data set TCGA-CDR,precision of method proposed in this article is superior to recent similar methods in cancer metastatic lesion prediction,especially in cancer brain metastasis prediction.Precision has been greatly improved by method proposed in this article compared with three commonly used machine learning models.Experimental results effectively proved the validity and accuracy of proposed method for cancer metastatic lesion prediction,as well as its referential value in cancer clinical. |