Multi-Source Data Fusion For Disease Genes Prediction

Posted on:2022-07-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Lin

Full Text:PDF

GTID:2530306326473434

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Disease genes prediction plays an important role in biological research,which greatly contribute to the study of pathogenic mechanisms.Due to the high noise and high dimensionality of single biological source data,the reliability of the data has been affected greatly.Given the facts that biologists usually tend to integrate multi-source data to avoid bias,this thesis mainly studies the method of fusing multi-source features to predict disease genes.Specifically,this thesis studies the problem from the perspective of feature fusion and model fusion.The research content is as follows:(1)To mine the topological information in the gene-gene association network adequately,this thesis constructs a model using network embedding for deep collaborative filtering(NEDCF)and uses structural deep network embedding to extract the features of the gene-gene interaction network.The NEDCF model uses the Inductive Matrix Completion to predict the disease genes after concatenating the features.Given the problem of missing negative samples in disease genes prediction,this thesis introduces PU Learning to balance the risk of unlabeled data as negatives.The experimental results show that NEDCF can effectively improve performance.(2)To fuse multi-source data from a feature perspective,this thesis constructs a model with multi-view features for deep collaborative filtering(MVDCF)and introduces deep canonical correlation analysis to model the correlation between multisource data.In order to avoid the sparsity of the data,similarity graph of gene and disease are constructed based on gene features and disease features respectively to supplement the gene-disease association graph.Furthermore,a graph convolutional neural network has been used for feature extracting.Besides,the PU learning method is also introduced to avoid the bias problem caused by negative sampling.The experimental results show that MVDCF can effectively improve the performance of the algorithm.(3)Considering the lack of negative samples in the traditional disease genes prediction problem,this thesis constructs a model using a dual graph convolutional neural network for deep collaborative filtering(DDCF)and introduces a preferencebased model to avoid the bias problem caused by taken unlabeled samples as negative samples.Moreover,given that the preference-based model can only learn local information,this thesis adopts unsupervised learning to integrate it with the traditional dot product-based method,which takes into account both global and local information.Experiments show that DDCF can effectively alleviate the problem of missing negative samples in the PU scenario.

Keywords/Search Tags:

Disease Genes Prediction, Collaborative Filtering, Graph Convolutional Neural Network

PDF Full Text Request

Related items

1	Prediction Of Disease Genes Based On Nonlinear Induction Matrix Completion Model
2	Graph-based Machine Learning Algorithms For Microbe Network Prediction
3	Prediction Of MicroRNA-disease Association Based On Graph Convolutional Network
4	Research On The Prediction Method Of MicroRNA And Disease Association Based On Two-layer Network
5	Research On Disease-related CircRNA Prediction Method Based On Graph Neural Networ
6	The Study And Application Of Multiplex Graph Based On Graph Convolutional Neural Network
7	Research On Graph Embedding Model Based On Deep Neural Networks
8	Disease-lncRNA Association Prediction Based On Machine Learning And Convolutional Neural Network
9	MiRNA-disease Associations Prediction Research Based On Graph Autoencoders And Collaborative Training
10	Research On Temperature Prediction Based On Graph Convolutional Neural Networ