Font Size: a A A

Research On LncRNA-disease Association Prediction Method Based On Random Fores

Posted on:2024-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:M R ZhangFull Text:PDF
GTID:2554306923988749Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,there has been a continuous increase in the incidence of complex diseases,and research has shown that long non-coding RNAs(lncRNAs)are associated with the occurrence and development of diseases.lncRNAs are RNA molecules longer than 200 nucleotides that cannot be translated into proteins.They can regulate gene expression levels in various ways,thereby affecting the development of diseases.Although traditional biological experiments can accurately identify lncRNAs,the identification process is time-consuming and requires a lot of manpower and resources due to the extremely large number of lncRNAs in the human body.The emergence of computational models has introduced new approaches for identifying the association between lncRNAs and diseases.Using computational models to identify lncRNAs associated with diseases can not only effectively save costs,but also improve time efficiency and serve as an auxiliary tool for identifying lncRNA-disease associations.In recent times,machine learning algorithms have been applied to association analysis and become an important tool for studying the mechanisms of complex diseases.Among them,the random forest algorithm has been widely promoted due to its good generalization ability and robustness.This dissertation aims to use random forest as the prediction algorithm and propose three reliable computational models to predict the potential association between lncRNAs and diseases,as follows:(1)To address the problem of difficult to handle noise and redundant information in lncRNAdisease association data,the prediction method based on random forest and Lasso feature extraction(MHILDA)is proposed to explore potential associations between diseases and lncRNAs.MHILDA integrates three different data sources of lncRNAs,miRNAs and diseases,thus enabling the predictor to training to better learn a priori knowledge.Meanwhile,MHILDA uses Lasso to perform key feature extraction on sample features,which can effectively evaluate the importance of features to remove redundant data and improve the computational efficiency of the prediction method.(2)To address the problem that existing models predict lncRNA-disease associations from too single perspective,the dual-path random forest-based computational method(iLncDA-RSN)is proposed to predict lncRNA-disease associations.iLncDA-RSN integrates lncRNA-disease associations from both disease and lncRNAs perspectives,respectively.In addition,unlike traditional methods,iLncDA-RSN does not directly fuse similarity networks into the model,but uses random walk with restart to integrate multiple different types of networks,thus better mining the potential information in the networks.(3)An ensembled random forest-based prediction method(iden LD-AREL)is proposed to calculate the association of unknown lncRNA-diseases to address the existing severe imbalance between positive and negative samples.The method uses the resampling strategy to resample unlabeled samples to construct multiple different balanced training subsets,aiming to address the existing sample imbalance problem.Meanwhile,iden LD-AREL ensembles multiple random forest to learn training subsets,which can effectively improve the generalization ability of the model and thus better predict potential lncRNA-disease associations.The proposed methods all use random forests as predictors and are used to predict diseaserelated lncRNAs with stable prediction performance.The experimental results show that the methods proposed in this dissertation are more advantageous compared with similar methods and can effectively predict potential lncRNA-disease associations.
Keywords/Search Tags:LncRNA-disease association, Random forest, Computational model, Feature extraction, Ensemble learning
PDF Full Text Request
Related items