Font Size: a A A

Prediction Of Bacterial SRNA Targets Using Gene Expression Profile

Posted on:2012-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WuFull Text:PDF
GTID:2214330371463010Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Bacteria sRNAs are a class of widespread regulatory RNAs with 40 to 500 nucleotides in length. With the application of comprehensive strategy combinining bioinformatics prediction and experimental validation, it has been found that more and more sRNAs play an important role in a variety of physiological processes through binding to their target mRNAs or proteins, such as expression regulation of outer membrane proteins, iron homeostasis, quorum sensing and bacterial virulence. For example, according to the positions of binding regions of imperfect base pairing between sRNAs and their target mRNAs, sRNAs can exert post-transcriptional activation or repression on the target genes. Additionally, most sRNA-mRNA interactions need chaperone Hfq to stabilize sRNA and to promote binding reaction.Currently, we can apply experimental protocols or bioinformatics models to identify sRNA targets. The merit of experimental methods can provide direct support for sRNA-target interaction. However, the experimental methods for genome-wide identification of sRNA targets have the shortcomings such as time-comsuing and labour-costing. These methods include chromosomal reporter gene fusions, affinity precipitation, microarry and proteomics. The merit of bioinformatics prediction can provide quick support for experimental confirmation. The current trend is to combine above two merits to identify sRNA targets. Therefore, developing an effective model for prediction of sRNA targets is very important. To our knowledge, five sequence-based models have been presented, and some of them have a relatively high prediction accuarcy. However, there are still two shortcomings for the sequence-based prediction models. The first shortcoming is that it gives a large number of potential targets for most sRNAs, which makes experimental verification difficult. The second one is that the potential predicted targets are not guaranteed to be functional because of most sRNAs expressed conditionally.In order to address these two questions, here we reported two parts of work. First, we constructed a comprehensive database for sRNA targets verified by experiments. Then we constructed a prediction model sTarExp by using gene expression profile.To construct the database, we systematically read the peer-reviewed papers associated with sRNA research, and extracted the detailed information such as the binding regions and mutation positions. Then, we constructed the database sRNATarBase using PHP and Mysql. Currently, the database contains 11 protein targets and 381 mRNA targets. The database not only provides support for sRNA functional study, but also contribues a benchmark training dateset for developing models for sRNA target prediction.To construct prediction model using gene expression profile, we carefully checked the entries of the database sRNAMap, a comprehensive database for sRNAs. Finally, we extracted the expression profile dataset GSE3665 from GEO database as the research object. Considering the information from both the dataset GSE3665 and the entries from the sRNATarBase, we extracted a training dataset composed of 64 interactions as positive samples and 158 no interaction as negative ones. Theoretically, there must be a close relationship between expression levels of sRNAs and their target mRNAs. Here we presented a method, called random correlation coefficient method, to construct new features using the original dataset, and 1000 new features were generated. The final dataset contained 1000 features across 64 sRNA-mRNA interactions and 158 no interactions. Then, we constructed the model using Na?ve Bayes method and a feature forward selection procedure with leave-one-out cross-validation classification accuracy as the object function. To identify the optimal feature sets and related model, we carried out stability analysis, which showed the highest stable index 0.7806 was obtained using five features, namely 33, 270, 391, 438 and 958. Finally, the optimal feature set was selected, and the corresponding 1000 classifiers from stability analysis were taken as the final prediction model, named sTarExp. A potential complex will be predicted to be positive if there are more than 500 classifiers to predict it as positive. Based on the model sTarExp, the 23 positive (TP=23, FN=41) and 155 negative samples (TN=155, FP=3) from the training dataset were correctly predicted. Therefore, the prediction accuracy, sensitivity, specificity and positive prediction value was 79.28% ((TP+TN)/(TP+TN+FP+FN)), 35.94% (TP/(TP+FN)), 98.1% (TN/(TN+FP)) and 88.46% (TP/(TP+FP)), respectively. The prediction accuracy was higher than 70.00% from Zhang's model, and also higher than 66.7% from program TargetRNA on their training datasets, but was less than 91.67% from our previous model sRNATargetNB. To improve the prediction perforrmance, here we also presented an integrated scheme considering the intersection set of sequence-based predicted results and expression profile-based prediction results.To demonstrate the performance of the model sTarExp and integrated scheme, both the models sTarExp and sRNATarget were applied to predict each combination of 47 sRNA and 4023 mRNA extracted from dataset GSE3665. For the model sTarExp, the number of targets varied from 5 to 566 with the average 111 for P=1.00, 33 to 1223 with the average 311 for P=0.95, and 48 to 1860 with the average 614 for P=0.50, respectively. However, the integrated scheme showed that the average number of targets were only 5 for P=1.00, 20 for P=0.95, and 68 for P=0.50, respectively. In addition, the prediction positive value for the integrated scheme was also higher than the former two models, sTarExp or sRNATarget. And,the PPV value is much higher than the two models. Therefore, the result clearly showed that the integrated scheme provided better support for experimental verification of sRNA targets.To get detail information of predicted targets by both sTarExp and the integrated scheme, please see our network page http://ccb.bmi.ac.cn/starexp/。...
Keywords/Search Tags:sRNA, target, prediction, gene expression, sRNATarBase
PDF Full Text Request
Related items