Font Size: a A A

Identification Of The A-to-I RNA Editing Sites Based On Support Vector Machine And Large-scale Detection Of Human Tissue-specific A-to-I RNA Editing Events

Posted on:2011-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:G H FengFull Text:PDF
GTID:2120360308974956Subject:Biosafety
Abstract/Summary:PDF Full Text Request
With the development of technology and bioinformatics, the biological research is gradually entering the post-genomic era. The scientists not only focus on the translational genes, but also interested in the mechanism of their expression regulation. As a post-transcriptional modification, RNA editing can regulate the genes expression by recoding the genes sequences. In general, two kinds of editing patterns found are the InDels of nucleotides and the base modifications, respectively. One of the classical examples is the insertion and deletion of uridine residues in mitochondrial transcripts from kinetoplastid protozoa under the guiding of gRNA, which is also the earliest editing pattern. The cytidine to uridine (C-to-U) conversions is discovered in the plant mitochondria and chloroplasts. The adenosine to inosine(A-to-I) modifications are the most common editing type of the cellular mRNAs of numerous eukaryotic species. This reaction creates an I:U mismatch, destabilizes the double helical structure and, thus,"unwinds"the duplex.Firstly, our work is to build the classifier based on Support Vector Machine (SVM) to identify the A-to-I RNA editing sites. From the knowledge of the published papers, we found the discrepancy of the base frequencies, conservation scores and the secondary structures of the sequences nearby the editing sites between the editing and unedited sites. We collect 254 non-redundancy A-to-I RNA editing sites and 1456 non-redundancy unedited site which confirmed through the molecular experiments. In the process of constructing the classifier, we choose the Radical Basis Function (RBF) to construct the model using Libsvm. Leave one out cross-validation is considered to be the most rigorous test for evaluation of performance; we just use it to study the classifier's performance. For performance evaluation we used standard parameters which routinely used in other prediction methods, the parameters of sensitivity, specificity, and total accuracy are calculated. At last, Receiver Operating Curve (ROC), which is a known threshold independent parameter, is used to evaluate performance and Area under the ROC Curve (AUC) is also calculated. Under these parameters, a Support Vector Machine based model is developed with an accuracy of 80% and AUC 0.85. Using independent data to evaluate this model, the performance of the model is accuracy of 70% and AUC 0.75.The other objective we expected is to find human tissue-specific A-to-I RNA editing sites. 32,316 non-redundancy A-to-I RNA editing sites identified by different methods are collected from published literatures. We used Bayesian statistics approach and Fisher exact test to identify the tissue-specific A-to-I RNA editing sites. By this LOD score and p-value measure, we identified 340 tissue-specific A-to-I RNA editing sites in human 36 tissues above LOD score 2 and significance levelα= 0.05. For more strictly standard, we did the multiple significance tests using false discovery rate (FDR). Lastly, we get 23 tissue-specific A-to-I RNA editing sites in human 14 tissues. For the functional analysis, we found a tissue-specific RNA editing sites located in miRNA target region.In this project, we built the first classifier based on Support Vector Machine to identify the A-to-I RNA editing sites and the first statistics model to identify tissue-specific A-to-I RNA editing sites. One of the advantages of the classifier is that it can be used independent of the transcriptional data. Meanwhile, there are some deficiencies in these works. For example, the performance of the classifier on the impendent data isn't as good as in the training data. The tissue-specific A-to-I RNA editing sites also need to be confirmed through the molecular experiments.
Keywords/Search Tags:A-to-I RNA editing sites, Support Vector Machine, tissue-specific, Bayesian statistics, Fisher exact test
PDF Full Text Request
Related items