Font Size: a A A

Design And Development Of Essentiality Annotation Tool For Bacterial Genes Based On Integrated Features

Posted on:2020-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:K XueFull Text:PDF
GTID:2480306131471634Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
Genes support the basic structure and function of life,but for all the genes of an organism,only a part of them are indispensable to maintain its life activities under certain conditions.Usually,these genes are called essential genes,which has more important significance in the field of life evolution and medical research and development.The continuous development of high-throughput sequencing technology has led to the rapid growth of DNA sequence data,and the corresponding gene annotation tools are also emerging,especially the bacterial genome annotation has become more mature,more accurate and faster.However,in terms of essentiality annotation,there is still a lack of powerful tools that can not only meet the requirements of high-throughput processing,but also make a relatively accurate judgment of essentiality.This work is mainly based on the essential genetic data of a large number of experimental literatures.Various features related to the essentiality are extracted and integrated systematically.Then combined with machine learning method and network technology to design and initially develop an online tool for automatically essentiality annotation of bacterial genomes,named as EGG-Prober.It can either be used to identify bacterial essential genes on a genome-wide scale,or annotate the genomic essentiality for a de novo sequenced genome systematically.The contents of each part of this paper are summarized as follows.The first part introduces the current research situation in the field of bioinformatics and the research status of essential genes in bacterial genomes.The second and third parts introduce the materials and methods in this work,includes datasets construction,extraction of features and Support Vector Machine(SVM)method.We mainly selected organisms from DEG15.2 and OGEEv2 databases,and established training and testing sets respectively according to the data extracted from the original literature.We integrate multiple features,such as codon usage bias,Z-curve parameters and BLAST homology search index to construct and test the essential gene prediction model for bacteria.In the fourth part,the essentiality annotation system of bacterial genomes EGG-Prober is introduced,the current version is EGG-Prober0.1.It is a user friendly online service with friendly interface and the wide range of application.In addition to the essentiality annotation and sequence classification download functions,it can also provide gene location and functional annotation,gene distribution information on the DNA replication strands,genome structure visualization and other functions.In the fifth part,the function of essentiality prediction of EGG-Prober 0.1 is tested and compared with the existing prediction methods.The results showed that the sensitivity,specificity and AUC values of most of the 35 strains from the training set were above 0.9 and for the six testing strains,the AUC values ranged from 0.862 to 0.999,with an average of 0.907.In addition,the average sensitivity,specificity and accuracy could reach 0.816,0.869 and 0.885,respectively.Further analysis shows that our method based on the integrated features of essentiality can reduce the over-fitting problem caused by over-dependence on sequence similarity,and has a relatively balanced predictive ability.It may have a broader scope of application and be more practical for the new sequencing strains.Moreover,annotating for a completed bacterial genome can be finished within 15 minutes,this will meet the requirement of high throughput.The deficiency will be improved in the new version.We will improve the system in the future versions.At the end of the paper,we summarized the whole paper and prospected the future work.The rich annotation information provided by EGG-Prober will form a good basis for further exploration of bacterial genomes.
Keywords/Search Tags:Bacterial genomes, Integrated features, Support Vector Machine, Essentiality annotation, EGG-Prober
PDF Full Text Request
Related items