Font Size: a A A

Research On Prediction Of Association Between Circular RNA And Disease In Human Genome

Posted on:2022-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:S H LiFull Text:PDF
GTID:2480306524982439Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Circular RNA(circRNAs)is a new-type of non-coding RNA that does not contain 5'cap and 3' tail structure.A large number of studies have shown that circRNA is associated with human diseases.Additionally,some researches indicated that circRNA is stable,tissue-specific and highly conservative.Thus,circRNA is assumed as a promising biomarker of human diseases,which can provide instruction to develop related molecular target drugs and comprehend the pathogenesis of diseases.Currently,biological experimental methods to confirm the correlation between circRNA and diseases include exonuclease digestion,polyacrylamide gel electrophoresis and Northern blot.However,few associations between circRNA and disease were identified.Due to the complex of biology network,circRNA are bound to participate in the occurrence and development of diseases.However,it is costly and laborintensive for these wet-experimental methods.The development of data mining and artificial intelligent technqiues provide us an opptuinity to discovery the correlation between circRNA and disease.Although existed computational approaches have made some progress,they only took advantage of the similary feature of circRNAs and diseases,but ignored the sequence information of circRNA.Moreover,they did not propose a reliable selection schema of negative samples.In this thesis,we constructed a model to study the circRNA-disease association by extracting the sequence feature of circRNA and similarity feature of diseases.We firstly collected the known association data and circRNA sequences from Circ R2 Disease database and circ Base,respectively.After data cleaning,we used k nucleotides composition algorithem to extract the feature of circRNA sequences,and then computed its incorporated similarity of Tanimoto similarity and gaussain interaction profile similarity.After that,we integrated the semantic similarity and gaussian interaction profile similarity of diseases to represent diseases.Based on the hypothesis that less similar disease was not associated with the same circRNA,we picked out the reliable negative dataset.Finally,we construed an SVM classfier and utilized to 5-fold cross-validation and receiver operator characteristic curve(ROC)to evaluate the performance of model.Since the first model only uses the sequence feature of circRNA without integrating the topological structure information of the original association network,we introduced another algorithm which was based on the graph autoencoder(GAE).After obtaining the embbeding feature of every node in original network through GAE,the idea of link prediction was adopted to combine each circRNA and disease into one sample as the input of SVM.Finally,we published all the data and codes to github who's url is https://github.com/yizhishagua/circRNA-disease-project4.
Keywords/Search Tags:circRNA-disease association prediction, Graph Autoencoder, Medical Subject Headings, Graph Autoenconder, Machine learning, Link prediction
PDF Full Text Request
Related items