Font Size: a A A

Identification Of The Functional Relevance Of CircRNA With Disease And SNP In The Human Genome

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiuFull Text:PDF
GTID:2404330629480108Subject:Biology
Abstract/Summary:PDF Full Text Request
As a rising star of the RNA family,many researches have found that circular RNA(circRNA)was involved in many biological functions,especially regulatory functions.At the same time,numerous studies have also shown that circRNA is associated with human diseases,even some complex diseases.However,there are few experiments verified the connection between circRNA and diseases,and it is a time-consuming and impractical task to identify disease-related circRNA through biological experiments.Due to the performance deficiencies of the existing methods,a feasible and effective calculation method for predicting the correlation between circRNA and disease is worth further study.There are some researches evidence that SNP(single nucleotide polymorphism)on circRNA can cause abnormal expression of circRNA,leading to the occurrence of some diseases.These findings also imply that SNP,as intermediate-inducible factors,may correlate the circRNA with diseases.Therefore,considering that SNP on circRNA may play an important role,and it is necessary to explore the effect of SNP on the biological functions of circRNA.This thesis is based on the above two deficiencies in the existing research,and the main tasks are as follows:(1)A new method,SIMCCDA(Speedup Inductive Matrix Completion for CircRNADisease Associations prediction),was proposed to predict the association between circRNA and diseases.Data were collected from three databases(circRNADisease,Circ2 Disease and CircR2Disease)to construct datasets containing known circRNA-disease association data.Based on the known circRNA-disease associations,circRNA sequence similarity,disease semantic similarity,and Gaussian kernel similarity,the main feature vector is extracted through PCA(principal component analysis).Then the speedup induction matrix completion method(IMC)is used to build the final model.SIMCCDA obtained the area of a receiver operating characteristics curve(AUC)of 0.8465 by LOOCV(leave-one-out cross validation)in the dataset.It also surpasses other advanced methods in predicting the associations between circRNA and disease.In addition,case studies of breast cancer,stomach cancer and colorectal cancer were conducted to further evaluate predictive performance.All the results of the experiment show that SIMCCDA has reliable prediction capabilities.We hope that SIMCCDA can be used to promote further development in the field and follow-up investigations by biomedical researchers.The relevant data and code of this work are available at https://github.com/bioinformaticsAHU/SIMCCDA.(2)We annotated the SNP related to human circRNA and explored their impact on the potential function of circRNA.We obtained the human circRNA data and SNP data from circBase,circBank,and dbSNP respectively,and completed the annotation of SNP on the circRNA according to the coordinate position of the genome.We identified 40,409,038 SNPs on 140,407 human circRNAs.We analyzed the differences in the distribution of SNPs on different circRNA and found that mutations in conservative circRNA and intragenic circRNA were not easy to occur.The miRNA(microRNA)target prediction tools were used to evaluate the effect of SNPs on circRNA-miRNA interactions.Among them,23,945,458 SNPs destroyed circRNA-miRNA binding sites,and 22,512,946 SNPs created new circRNA-miRNA binding sites.The above results indicated that SNPs can cause the loss or gain biological function of circRNAs.Such effects may lead to the association between circRNA and disease.At the same time,the difference analysis of the SNP density between circRNA-miRNA binding region and flanking region indicated that the binding region was relatively conservative due to the need of performing biological functions,and less mutations occurred on it.In addition,using the GWAS Catalog database,4,481 SNPs significantly related to disease or trait were obtained on circRNAs.And tagSNPs that can represent a group of SNPs were further screened out for researchers to carry out subsequent biological experiments.These results suggest that circRNArelated SNPs play an important role and these may provide a new perspective for revealing the pathogenesis of diseases.
Keywords/Search Tags:CircRNA-disease association, Similarity calculation, IMC, SNP, Function effect
PDF Full Text Request
Related items