lncRNA is a long non-coding RNA with a nucleotide length of more than 200 nt.Existing studies have shown that abnormal expression of lncRNA is closely related to complex human diseases such as cancer,neurological diseases,cardiovascular diseases,etc.Therefore,accurate identification of disease-related lncRNAs is important for the subsequent study of pathological mechanisms and therapeutic means and prognosis of complex diseases.Due to the problems of long cycle time and high cost in identifying disease-associated lncRNAs directly through biological experiments,researchers have designed a series of computational methods to predict disease-associated lncRNAs and achieved better results.The existing lncRNA-disease association prediction methods mainly include biological networks-based methods,matrix decomposition-based methods and machine learning-based methods,all of which face the major challenge of sparse data.Most of them only consider the shallow association information between lncRNAs and disease features,while lacking a deeper understanding of the deeper potential connections between lncRNAs and disease features.To improve the accuracy of disease-related lncRNA prediction algorithms,this paper proposes a lncRNA-disease association prediction algorithm based on geometric complementary heterogeneous information and random forest.First,the geometric complementary heterogeneous information approach is used to integrate lncRNA-miRNA interaction information and miRNA-disease association information validated by biological experiments.Then,lncRNA and disease feature information,including their respective similarity coefficients,are fused into the sample feature space.Third,an auto-encoder is used to map the high-dimensional sample feature space to the low-dimensional space to represent the lncRNA-disease association samples.Finally,a random forest classifier was trained on the low-dimensional lncRNA-disease sample space for lncRNA-disease association prediction.The results of the 5-fold cross-validation experiments showed that the area under the receiver operating characteristic curve(AUC)and the area under the Precision-Recall curve(AUPR)of the lncRNA-disease association prediction algorithm proposed in this paper reached 0.9897 and 0.7040,respectively,which exceeded several existing similar algorithms.In addition,case studies in colon cancer,gastric cancer and breast cancer showed that the algorithm proposed in this paper has excellent ability to predict disease-associated lncRNAs. |