Font Size: a A A

Identification Of The Formation And The Protein-coding Potential Of Circular RNAs Using Deep Learning

Posted on:2022-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:C J TanFull Text:PDF
GTID:2480306740979819Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Circular RNAs(circ RNAs)are endogenous RNAs with a covalently closed structure,which play varieties of roles in organisms through biological pathways.Current tools for identifying circ RNAs can be grouped into two categories:one is based on high-throughput sequencing data,and the other consists of computational predictions using sequence characteristics.These tools have their own advantages and disadvantages.By integrating biological factors,which are considered to affect the formation of circ RNAs in published papers and above tools,with the high-order abstract features extracted from sequences by convolutional neural networks,we constructed a machine learning classification model(circ Pred)to identify circ RNAs.Results on validation dataset showed that circ Pred achieved better performance in distinguishing circ RNAs from linear coding RNAs compared with several other models and tools.Furthermore,we studied some characteristics of the mechanism of circular RNAs'formation.Though circ RNAs were considered to be non-coding RNAs at its initial stage,it had been found that some circ RNAs are able to produce peptides in vivo which have important biological functions.Now most tools for evaluating RNA translation potential are developed on the basis of linear transcripts which do not consider the influence of the special circular structure of circ RNAs.Only two tools are specially developed for recognizing the translation potential of circ RNAs(circ Pro and circ Code),both of which,however,require Ribo-Seq data and are user-unfriendly.Considering the small size of the translatable-validated circ RNA samples,we improved a Tri-training semi-supervised algorithm and developed a tool particularly designed to identify the translational potential of circ RNAs only based on sequence features.Our results in the paper include:(1)A computational framework(circ Pred)was proposed based on the integration of manually extracted features with automatic high-order features abstracted by deep learning.Circ Pred can effectively distinguish circ RNAs from linear protein-coding RNAs.Features including sequence composition characteristics,splicing signals,ALU repeats,and A-to-I RNA edition were also explored between circ RNAs and protein-coding RNAs.Results showed that there were significant differences between them for these features.Motif analysis of sequences found that some motifs could be involved in the formation of circ RNAs.(2)An improved Tri-training semi-supervised method was used to identify the translational potential of circ RNAs based on traditional sequence coding features.By ranking the importance of the features,it was found that traditional sequence coding features could also be used to identify the translational potential of circ RNAs.By analyzing the distribution of GC content,IRES elements and m~6A methylation modification sites in sequences and different regions(ORF and non-ORF)of sequences in the positive and negative samples,results indicated that these characteristics were different in the sequence and different regions.
Keywords/Search Tags:cicular RNAs, cyclization of RNAs, translation of circRNAs, SVM, CNN, Tri-training
PDF Full Text Request
Related items