| The morbidity and mortality of many complex diseases are increasing,which seriously threaten people’s life and health.It has been found that long non-coding RNAs(lncRNAs)are closely related to the occurrence and development of complex diseases,and they affect human development and the occurrence of numerous diseases by participating in the regulation of target genes.Therefore,the prediction of disease-associated lncRNAs can effectively help early medical diagnosis of diseases and accelerate the understanding of disease pathogenesis.However,traditional biological experiments to identify lncRNA-disease associations(LDAs)are often inefficient and costly.In recent years,the advancement in high-throughput sequencing technology and the accumulation of theories have expanded the availability and quantity of association data.Therefore,the development of efficient and reliable computational models to predict LDAs based on bioinformatics has become a hot research topic.This thesis aims to integrate the information of multiple biological similarities and known associations for modeling,and develop four reliable prediction models to explore the potential associations between miRNAs and complex diseases,and provide directional auxiliary work for biological experiments.The specific research contents are as follows:(1)Aiming at the problem that the traditional machine learning model has more noise and outliers,a novel method(LDCMFC)of collaborative matrix factorization is developed to predict the LDAs.LDCMFC introduces the Gaussian interaction profile into the prediction framework,which can effectively fuse the network similarity information of lncRNAs and diseases.At the same time,we introduce the correntropy into the collaborative matrix factorization model,and the maximum correntropy replaces the traditional minimum Euclidean distance to improve the robustness of the algorithm.In addition,sparse constraints are applied to the loss function,which reduces the matrix complexity and the difficulty of analysis,and improves the prediction performance of the algorithm.(2)Aiming at the problem that the existing prediction models are difficult to make full use of multiple biological sources,a method based on heterogeneous networks and stacked autoencoder model(HSAELDA)is proposed to predict the LDAs.HSAELDA integrates the experimentsupported miRNA-disease associations(MDAs)and LDAs,the disease semantic similarity(DSS),the lncRNA functional similarity(LFS)and the lncRNA-miRNA interactions(LMI)as input features.Meanwhile,we use stacked autoencoders to learn the potential feature representation of the original input features.Finally,the LightGBM classifier is used for training.(3)To solve the problem that biomedical information for LDA prediction is too single,a framework based on geometric complement heterogeneous information and convolutional neural network(HCNNLDA)is proposed to predict the LDAs.Firstly,HCNNLDA first uses geometric complement heterogeneous to integrate lncRNA-miRNA associations and miRNA-disease associations.Then,the convolution neural network is used to learn the original feature vector in low dimension,and the optimal subspace is obtained.In addition,considering the possible nonlinear relationship between features after dimensionality reduction,XGBoot is used to predict the potential LDA.(4)Aiming at the problem that the category of feature information is simple and the known association network is too sparse,a method based on heterogeneous networks and graph attention deep autoencoder(GATELDA)is proposed to predict the LDAs.First of all,GATELDA effectively combines linear and nonlinear features.Among them,the linear features of diseases and miRNAs are constructed by disease-lncRNA correlation profiles and miRNA-lncRNA correlation profiles,respectively.Then,the graph attention network is employed to extract the nonlinear features of diseases and miRNAs by aggregating information of each neighbor with different weights.Finally,the linear and nonlinear features are fused,and the deep neural network is used to infer the LDAs.The proposed methods in this thesis have been applied to LDA datasets,and their results show that these methods have stable predictive performance in LDA data,and can effectively learn similarity information and known association information.They are not only effective in predicting potential associations,but also outperform the existing similar methods. |