Font Size: a A A

Research On The Performance Of Genomic Prediction Models For Prokaryotes

Posted on:2017-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:D W JiFull Text:PDF
GTID:2180330485985112Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of gene sequencing technology, bio-genome sequence data we get is showing explosive growth. To analyze DNA sequences, we need to identify the genes first, the traditional method of experimental verification is too slow to satisfy this demand. Thus, a series of related genomic prediction tools emerged, Prodigal, ZCURVE, GeneMark and GLIMMER are some of the outstanding representatives. Due to various reasons or defects in the technical principles, the outcome of these genomic prediction tools will have the wrong gene or missing ORFs that have protein coding function. They behave different in biology that has the different GC content. We need to do an effective and objective evaluation of the performance of these genomic prediction tools. as well as provide a theoretical basis for different biological DNA sequence of the genome to select the optimal combination prediction tools.We extracted 150 organisms with DNA sequence and gene annotation in accordance with the distribution if GC content from the latest NCBI genome database in this research. There are also missing or incorrect information in these existing annotations. We do a re-annotation to all these 150 organisms to find new genes which missed and eliminate which include non-coding ORFs. These updated results are data sources for us to compare the performance of the four genomic prediction tools.In the comparison of the single prediction tools, we found that the overall performance of the Prodigal is the best on different GC content with high consistency. GLIMMER performing well on low GC content(0.10-0.35), performing general on the high GC content(0.35-0.75). Additional predictive ZCURVE predicted results EPR GC content in the range(0.35-0.55) on the outstanding performance.We also explored the best combination of gene prediction tools on different GC content of the organism genome. By using 130 organisms as a training set and 20 organisms as a test set, we found that Prodigal plus GeneMark, Prodigal plus GLIMMER and GeneMark plus GLIMMER these three combinations have the best performance. By comparison, we found that the united prediction results have more advantages in predictive accuracy and extra prediction ratio EPR parameters than single prediction results. Finally, we have also developed an automatical tool for gene re-annotation and an online service for genomic prediction based on our findings.
Keywords/Search Tags:Genome Prediction, Genome re-annotation, Prodigal, ZCURVE, GeneMark, GLIMMER, Performance Comparison
PDF Full Text Request
Related items