| Disease genes prediction plays an important role in biological research,which greatly contribute to the study of pathogenic mechanisms.Due to the high noise and high dimensionality of single biological source data,the reliability of the data has been affected greatly.Given the facts that biologists usually tend to integrate multi-source data to avoid bias,this thesis mainly studies the method of fusing multi-source features to predict disease genes.Specifically,this thesis studies the problem from the perspective of feature fusion and model fusion.The research content is as follows:(1)To mine the topological information in the gene-gene association network adequately,this thesis constructs a model using network embedding for deep collaborative filtering(NEDCF)and uses structural deep network embedding to extract the features of the gene-gene interaction network.The NEDCF model uses the Inductive Matrix Completion to predict the disease genes after concatenating the features.Given the problem of missing negative samples in disease genes prediction,this thesis introduces PU Learning to balance the risk of unlabeled data as negatives.The experimental results show that NEDCF can effectively improve performance.(2)To fuse multi-source data from a feature perspective,this thesis constructs a model with multi-view features for deep collaborative filtering(MVDCF)and introduces deep canonical correlation analysis to model the correlation between multisource data.In order to avoid the sparsity of the data,similarity graph of gene and disease are constructed based on gene features and disease features respectively to supplement the gene-disease association graph.Furthermore,a graph convolutional neural network has been used for feature extracting.Besides,the PU learning method is also introduced to avoid the bias problem caused by negative sampling.The experimental results show that MVDCF can effectively improve the performance of the algorithm.(3)Considering the lack of negative samples in the traditional disease genes prediction problem,this thesis constructs a model using a dual graph convolutional neural network for deep collaborative filtering(DDCF)and introduces a preferencebased model to avoid the bias problem caused by taken unlabeled samples as negative samples.Moreover,given that the preference-based model can only learn local information,this thesis adopts unsupervised learning to integrate it with the traditional dot product-based method,which takes into account both global and local information.Experiments show that DDCF can effectively alleviate the problem of missing negative samples in the PU scenario. |