Font Size: a A A

Prediction Of CircRNA And Disease Associations Based On Machine Learning

Posted on:2021-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q FangFull Text:PDF
GTID:2510306041461394Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of high-throughput sequencing technology,more and more circular RNA(circRNA)molecules have been found in a wide range of eukaryotic cells.At the same time,various biological functions of circRNA have been gradually known by human beings.The main biological characteristics of circRNA include its function as a cavernous body of microRNA(miRNA).circRNA also participates in transcriptional regulation and parental gene modification.The endogeneity,abundance,conservation and stability of circRNA make it possible to become a biological marker of disease.In such a prospect,the recognition of protein binding sites on circRNA,the prediction of the assoications between circRNA and related proteins,and the prediction of the associations between circRNA and disease have become the main topics for a large number of researchers.In addition,recent studies have shown that circRNA is related to human disease genes and plays an important role in predicting drug targets.Therefore,it is necessary to predict the associations between circRNA and diseases,which will help us understand the minimum requirements of cell life activities and find new methods for treating diseases.However,the traditional methods of biological prediction of the associations between circRNA and diseases are inefficient and expensive.Although there are a few computational methods to predict the associations between circRNA and diseases.What’s more,the accuracy and efficiency of prediction still need to be improved.The focus of this paper is to further improve the accuracy and efficiency of prediction of the associations between circRNA and diseases.The main research works of this paper are as follows:(1)Based on the path weight,this method combines the gene ontology(go)data,the disease-related gene data and the circRNA disease-related associations data of the circRNA target gene to make full use of each pair of circRNA disease-related associations.Firstly,the disease similarity network,the circRNA similarity network and the circRNA-disease associations network are constructed.After the heterogeneous network composed of these three subnetworks,the correlation score of each circRNA-disease pair is calculated according to the weight of the path connecting them in the heterogeneous network,and it is named PWCDA.Considering that the similarity between circRNAs and the similarity between diseases only depends on the relevant biological data,it is not sufficient to combine biological data with network topology similarity network.The experimental results show that this method can significantly improve the accuracy of the prediction of circRNA-disease associations.(2)Based on the collaborative filtering of multiple data networks,the prediction method of circRNA-disease association is named ICFCDA.First of all,we calculate the functional annotation semantic similarity,sequence similarity and GIP similarity of circRNA through the ontology data of related genes of circRNA target gene,the sequence data of corresponding bases of circRNA and the disease association of circRNA.Secondly,the disease functional similarity and GIP similarity were calculated by the disease-genes associations and circRNA-disease associations,respectively.In addition,we also replace the disease name with the disease ontology ID to calculate the disease semantic similarity according to the DOES tool.Thirdly,we integrate multiple disease similarity and circRNA similarity into the final disease similarity matrix and circRNA similarity matrix.Finally,each pair of circRNA-disease associations was scored for the possible correlation by the improved collaborative filtering method.The experimental results show that this method not only improves the prediction efficiency of the association between circRNA and disease,but also improves the accuracy of prediction of the associations between circRNA and disease.(3)Based on the gradient enhanced decision tree,a new method named GBDTCDA is proposed to predict circRNA-disease associations.First of all,we construct circRNA similarity networks(CSN)using the data of circRNA expression profile,gene ontology and base sequence.Disease related ontology data and related genes participate in the construction of disease similarity network.The second is the statistical information of CSN,DSN and circRNA-disease associations network,the graph theory information of CSN and DSN,the representative biological indicators of circRNA-disease associations network,such as GC content and k-mer,and the implicit vector extracted from circRNA-disease associations network.Using the above information to represent the eigenvectors of each pair of circRNA-diseases association.Some of these eigenvectors are input to train the model,and the rest are used as test data.Through the analysis of the prediction results of the test data,it can be found that this method significantly improves the accuracy of predicting the associations between circRNA and disease.
Keywords/Search Tags:disease prediction, machine learning, circRNA-disease associations
PDF Full Text Request
Related items