| In the human genome,about 98% of DNA sequences do not encode proteins,but instead produce various types of non-coding RNAs that cannot be further translated into proteins.Among them,micro RNAs(mi RNAs)are a class of non-coding RNAs containing only about 22 nucleotides,which have many important biological functions and are associated with the occurrence and development of many human complex diseases due to their abnormal expression in the human body.Identifying mi RNAs related to human diseases is of great significance for a deeper understanding of the pathogenesis of diseases at the mi RNA level,and has important theoretical and practical value for the diagnosis,treatment,and prognosis of diseases.Treatment of diseases requires the assistance of drugs,and currently,the targets of most drugs are proteins.Scientists have pointed out that non-coding RNAs can also serve as targets for drugs.Identifying non-coding RNA molecules related to drugs can promote drug target selection,assist drug research and development,and ultimately achieve the treatment of diseases.Identifying disease-related mi RNAs and non-coding RNAs associated with small molecule drugs through biological experiments is time-consuming and inefficient.Therefore,this dissertation proposes the use of computational methods to predict disease-related mi RNAs and drug-related non-coding RNAs.The research work focuses on three main aspects:In the field of disease-related mi RNA prediction research,this dissertation proposed two prediction models: NSESVM,a disease-related mi RNA prediction model based on a reliable negative sample extraction strategy and improved support vector machine,and SAEMDA,a disease-related mi RNA prediction model based on stacked autoencoder.In the first prediction model of NSESVM,to address the issue of difficulty in obtaining negative samples in the field of disease-related mi RNA prediction,this study innovatively proposed a strategy to extract reliable negative samples from unlabeled disease-mi RNA samples using two PU learning classifiers.In the second prediction model of SAEMDA,to tackle the problem of class imbalance between labeled and unlabeled samples in the field of disease-related mi RNA prediction,this study constructed a stacked autoencoder neural network model and trained the network parameters through unsupervised pre-training and supervised fine-tuning,fully utilizing the information from labeled and unlabeled samples.This study designed a comprehensive test on the prediction performance of the prediction models of NSESVM and SAEMDA,including global leave-one-out cross-validation(LOOCV),local LOOCV,5-fold cross-validation,and three different types of case studies.The results show that NSESVM and SAEMDA have good prediction accuracy and stable prediction performance.In the field of drug-related mi RNA prediction,this dissertation presented two prediction models: RFSMMA based on random forest and CLDISMMA based on crosslayer dependency inference.The RFSMMA model integrated multiple feature information to construct feature vectors,and selected more robust features from numerous features to more efficiently distinguish whether samples are associated.The random forest algorithm realizes unbiased estimation of the generalization error,so the RFSMMA prediction model has good generalization ability.In the CLDISMMA model,the dissertation introduced disease information into the problem of drug-related mi RNA prediction,which improved the comprehensiveness of the data,and built a regularized optimization model based on the three-layer heterogeneous network of drug-mi RNAdisease.By solving the objective function of the optimization model through block coordinate descent algorithm,the prediction of small molecule drug-related mi RNA can be completed.This dissertation evaluated the prediction accuracy of the prediction models of RFSMMA and CLDISMMA using global LOOCV,two types of local LOOCV,and 5-fold cross-validation.Case studies on several important small molecule drugs were conducted to further evaluate the prediction performance of RFSMMA and CLDISMMA.The results show that the RFSMMA and CLDISMMA models are reliable and stable.In the aspect of predicting drug-related lnc RNAs,this dissertation constructed a drug-lnc RNA association dataset and designed a unified framework for predicting small molecule drug-related lnc RNAs based on supervised learning.The drug-lnc RNA association dataset constructed based on the D-lnc database fills the gap of lacking a unified and organized dataset in this field.Using the small molecule drug chemical structure fingerprint and lnc RNA sequence information,the methods for calculating the similarity matrix of small molecule drugs and lnc RNAs were designed.Based on the small molecule drug-lnc RNA association matrix,small molecule drug similarity matrix,and lnc RNA similarity matrix,a supervised learning-based framework for predicting small molecule drug-related lnc RNAs was proposed,and four drug-related lnc RNA prediction models,SLPDT,SLPRF,SLPAda,and SLPNB,were constructed based on this framework.This dissertation evaluated the reliability of the small molecule druglnc RNA association dataset and the effectiveness of the supervised learning-based framework for predicting small molecule drug-related lnc RNAs from multiple perspectives,including global LOOCV,two kinds of local LOOCV,5-fold crossvalidation,and two case studies.The results of several types of cross-validation and case studies both indicate that the small molecule drug-lnc RNA association dataset constructed in this study is reliable,and the four small molecule drug-related lnc RNA prediction models constructed based on the unified framework have good prediction accuracy. |