Font Size: a A A

Research On LncRNA-Disease Association Prediction Based On Deep Matrix Factorization

Posted on:2022-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:J QuFull Text:PDF
GTID:2480306761459724Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNA(lncRNAs)is a very important class of non-coding RNA in human body,whose length is more than 200 nucleotides,accounting for a large proportion in noncoding RNA.In recent years,more and more studies have shown that lncRNAs are involved in many biological regulation processes,and each link of gene regulation plays a crucial role,such as cell division and differentiation,pathological characteristics of diseases,immune response,cell metabolism and so on,which are closely related to the occurrence of many diseases.At present,lncRNA has become a new candidate molecule and target for diagnosis and treatment of some diseases.The exploration of lncRNA-disease association is of great significance for us to analyze the complex pathological mechanism of the disease from the molecular level,and also has very important guiding significance for the early diagnosis,middle treatment,late care and related drug development of the disease.However,it is time-consuming and costly to explore the lncRNA-disease relationship through routine biomedical experiments.Traditional biomedical experiments are time-consuming and laborious to predict the association between lncRNAs and diseases.In the face of many unknown functions and diseaseassociated lncRNAs,it is of great significance to develop computation-based methods to rapidly and effectively predict the association between lncRNAs and diseases.At present,the existing methods mainly include matrix decomposition and network representation learning,which are facing the main challenge of data sparsity.In addition,many existing methods only consider the association information between superficial features of lncRNA and disease,while ignoring the possible association between underlying features.This paper proposes a lncRNA and disease association prediction model(DMFLD)based on deep matrix factorization,which forms a multi-dimensional knowledge aggregation fusion model through multi-attentional mechanism,distance layer network,angle layer network and deep neural network.Thus,the highly nonlinear representation of potential features implied between lncRNA and disease can be explored from multiple dimensions to further improve the prediction accuracy.The main work includes the following aspects:1)In order to solve the sparsity problem of high-dimensional association matrix data,the decomposition of lncRNA and disease association matrix by word2 vec was introduced to obtain the potential factors(features)of lncRNA and disease.2)Factorization machines based on multi-head attention mechanism was proposed to automatically construct potential feature combinations with high correlation,so as to fully capture the internal relationship between lncRNA and potential high-order features of diseases.3)In order to describe the correlation between lncRNA and diseases from the spatial dimension,distance layer and angle layer networks were constructed.4)Deep neural networks were introduced to learn the nonlinear representation of lncRNA and potential disease features,and improve the accuracy of predicting the association between lncRNA and disease.Our method does not require a lot of extra calculation of similarity,and also uses mini-batch stochastic gradient descent to solve the problem of high time complexity and improve the training speed.Regularization constraint is introduced to solve the over-fitting problem.Finally,the deep neural network,multi-head attention network,distance layer network and angle layer network are connected together to obtain the prediction results by multi-layer perceptron.5)LncRNA-disease V2.0 large-scale data set can be used to explore more valuable results,and five indicators commonly used in the recommendation system are adopted: Hit Ratio(HR),Normalized Cumulative Loss Gain(NDCG),Mean Reciprocal Rank(MRR),Root Mean Square Error(RMSE)and Mean Absolute Error(MAE).First,in order to verify the influence of different matrix factorization models on the experiment,we designed a comprehensive experiment to compare the influence of different matrix factorization models(Funk SVD,SVD++,NMF)on the experimental results,and conducted systematic parameter tuning experiments on the important parameters of matrix factorization.Then,the proposed method based on deep matrix factorization is compared with current advanced methods,traditional machine learning models and deep learning frameworks.Experimental results show that all the indexes of deep matrix factorization are better than other methods.In addition,the case study shows that DMFLD proposed by us can predict lncRNAs associated with three diseases(lung cancer,cervical cancer and colorectal cancer)with high accuracy.6)In order to maximize the convenience of relevant researchers,we developed a web server for lncRNA and disease association prediction based on deep matrix factorization(http://lddmf.natapp1.cc/).
Keywords/Search Tags:LncRNA, disease, association, potential features, matrix factorization, factorization machines, attentional mechanism
PDF Full Text Request
Related items