Font Size: a A A

Prediction Of The Relationship Between CircRNAs And Complex Diseases Based On Heterogeneous Networks And Multi-data Fusion

Posted on:2021-12-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y FanFull Text:PDF
GTID:1480306044497014Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
CircRNAs are a class of endogenous non-coding RNAs with a covalently closed continuous loop that lacks 5'and 3'polarity structure.circRNAs could regulate gene expression at transcriptional or post-transcriptional levels by titrating miRNAs,regulating transcription and interfering with splicing.circRNAs also play a critical role in biological processes including transcription,mRNA splicing,RNA decay and translation.Recent studies show that circRNAs could paly significant roles in health and diseases.Owing to the characteristics of their universality,specificity and stability,circRNAs are becoming an ideal class of biomarkers for disease diagnosis,treatment and prognosis.Identifying the associations between circRNAs and diseases could help understand the complex disease mechanism.However,traditional experiments are costly and time-consuming for the identification of circRNA-disease associations.Therefore,we developed a high-quality database to deposit the deregulated circRNAs in diseases.Based on the assumption that simiar circRNAs are tend to be associated with similar diseases,we proposed four computational models to pridict the potential circRNA-disease associations from the perspective of bioinformatics,which provides the biological experiment guidance on specific circRNA for the medical and biological researches.This thesis will lay a foundation for the drug development and clinical diagnosis.The main contributions are as follows:(1)The CircR2Disease database is developed to deposit the high-quality circRNA-disease associations,and a brief analysis is provided for the deposited associations.Because of the experimentally validated circRNAs that associated with diseases are scattered in the published literatures,there is a lack of platforms and resources dedicated to collecting circRNA-disease related information.Here,we manually curated experimentally validated circRNA-disease associations from existing literatures,and established an online sharing platform CircR2Disease database.We collect,store and manage circRNA-disease related information by the playform,which is conducive to researchers' acquisition and application of data.(2)A computational model based on KATZ measure is proposed for the task of human circRNA-disease association prediction(KATZHCDA)by using the data in CircR2Disease database.In KATZHCDA model,the circRNA expression profiles,disease phenotype similarity,Gaussian interaction profile(GIP)kernel similarities and known circRNA-disease are integrated to construct the circRNA-disease heterogeneous network.By computing the KATZ index in heterogenous network,the potential circRNA-disease associations could be obtained from the prediction results.The LOOCV and 5-fold CV were implemented to investigate the effects of these four types of similarity measures.Therefore,it is anticipated that KATZHCDA model could become an effective resource for clinical experimental guidance.(3)Based on the topological and structural characteristics of circRNA-disease network,we developed the BWHCDA model to predict the potential associations.The circRNA regularoty similarity is introduced based on circRNA-miRNA regulatory network,and disease semantic similarity,GIP kernel similarities and known circRNA-disease association are also combined to construct the circRNA-disease heterogeneous network.According to the most known assciations are covered by circular bipartite subgraphs in the heterogeneous network,we used the bi-random walk on it to predict the potential circRNA-disease associations.The experiment results and case study shown that BWHCDA method outperforms other five methods for the task of circRNA-disease association prediction.(4)Owing to the noise of known circRNA-disease network may influence the prediction performance,we proposed a computational model to predict potential circRNA-disease network based on low-rank matix recovery and label propagation algorithm(LLPHCDA).Firstly,a new circRNA-disease adjacency network is reconstructed by eliminating the noise with low-rank matix projection(LMP)methods.Then,circRNA similarity network is construted based on circRNA sequence and GIP kernel similarity for circRNAs,and disease similarity network is constructed by integrating disease semantic similarity and GIP kernel similarity for diseases.The new circRNA-disease associations are considered as labels,which are propagated in the circRNA similarity network and disease similarity network.The results of LOOCV,5-fold and 10-fold cross validation and case atudy show that LLPHCDA can integrate effective information of various circRNAs and diseases,and further improve the prediction accuracy of circRNA-disease associations.(5)Because most of the constructed approaches for predicting circRNA-disease associations are seldom consider the integration of multiple similariy measures,and the similarity matrix and circRNAs-disease association matrix are very sparse.So we put forward a novel method named MSFCNN for the task of circRNA-disease prediction,which applied two layer convolutional neural networks on the feature matrix that integrated multiple similarity kernels and interaction features among circRNA,miRNAs and diseases.First,four circRNA similarity matrices and seven disease similarity matrices are repectively constructed based on the biological and topological properties of circRNAs and diseases.Furthermore,these similarity matrices are fusied with SKF method.Based on three biological premises about circRNAs,diseases and miRNAs,the feature matrix of each circRNA-disease pair could be constructed.By using a two layer convolutional neural networks,the deep and complex representations of circRNA-disease associations could be learned,and the potential results of associations could also be obtained.The various experimental results of MSFCNN model show that it is superior to the performance of traditional machine learning methods including SVM,RF and MLP.Overall,the four circRNA-disease association prediction models have good performance assessed by evaluation performance and case studies,which is based on multiple biological premises with multiple biological and toplogical infromation.Moreover,they still show good performance in the case that circRNA-disease association network is very sparse.Therefore,they are suitable for the task of circRNA-disease association prediction.
Keywords/Search Tags:circRNA, heterogeneous network, KATZ model, random walk, low-rank matrix, label propagation, similarity kernel fusion, feature matrix, convolutional neural network
PDF Full Text Request
Related items