| From single-gene disease to complex disease trait research,more and more researchers use computational tools to help and solve clinical disease diagnosis and treatment and other related problems.Powerful computing power provides efficient tools for processing tens of thousands of biomedical data,and deep learning models provide a new perspective for disease diagnosis and research.The research goal of this thesis is to explore the role of disease biomarkers.For the two types of biological macromolecules,nucleic acid and protein,this thesis uses the pre-training model and graph network model in deep learning to complete tasks such as sequence analysis and correlation prediction.This thesis not only focuses on the effects of genetic factors CircRNA and BRCA1 genes on diseases,but also explores protein-protein interactions in pathological mechanisms.The research content includes the following three aspects:1.Prediction of the association between CircRNA molecules and diseases.By constructing a biological information network,the topological features of CircRNA molecules and diseases are extracted from the network,and the Positive-Unlabeled learning strategy combined with Deep Forests model training method is proposed,which solves the problem of the imbalance of positive and negative samples of data,and realizes efficient association prediction tasks.2.The pathogenicity prediction of BRCA1 gene mutation.For the missense mutation of the BRCA1 gene,BERT is used to pre-train the gene-related protein sequence corpus,combined with the pre-training model representation and amino acid hydrophilicity coding to classify the mutation sequence for benign mutations or pathogenic mutations.3.Prediction of interactions between proteins.Construct protein interaction network,in which the features of protein nodes are learned from protein sequences.Through the understanding of protein interaction mechanism,the neighbor relationship of protein nodes is reconstructed to improve the efficiency of message passing,and it is used as the input of Graph Convolutional Neural network model for training.Finally,combine protein node features to complete the link prediction task.This thesis tests and verifies the proposed method by screening and preprocessing the existing data,and conducts comparative experiments to show the effectiveness of the proposed method.In this thesis,computational methods are used to predict diseaserelated gene and protein interactions,and to further understand the diseases biomarkers,which will help clinical disease identification and pathological mechanism research. |