Font Size: a A A

Research And Characteristic Analysis Of Bacteria Essential Gene Cluster Model

Posted on:2018-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:F Z ZhangFull Text:PDF
GTID:2310330515951554Subject:Engineering
Abstract/Summary:PDF Full Text Request
Essential genes,which are indispensable for living organism under optimized condition.The significance of essential genes are listed below:(1)the minimal gene set of essential genes is the basis of synthetic biology,and through the study of essential genes can help us understand the origin and evolution of life,produce microorganisms available in industry;(2)the proteins encoded by essential genes are involved in the most basic and important metabolic processes of organisms,therefore,essential genes can serve as a target for antimicrobial drugs.In recent years,the study of essential genes have become a hotspot in bioinformatics research.The object of study in this article is essential genes of bacteria identified by experimental methods.The original essential gene data is from DEG(http://tubic.tju.edu.cn/deg).Inspired by the clusters in the COGs(https://www.ncbi.nlm.nih.gov/COG/),we propose the concept of the clusters of essential genes that can store essential genes with the same or similar functions in the form of cluster,which is the largest difference between most database store genes,the size of the cluster reflects the conservation of the genes.To date,bacterial genetic data have been further enriched,such as the latest version of DEG(as of March 2017),which contains 46 sets of essential genetic data sets for bacteria and 16 sets of essential genetic data sets for eukaryotic bacteria,lay the foundation for the relevant research.Based on the necessary gene cluster model and the most up-to-date data,we constructed and updated the database of cluster of essential genes(CEG,Cluster of Essential Genes,http: //cefg.cn/ceg/),which was called CEG 2.0.The database stores the essential genes in clusters and adds a lot of information related to the necessary genes,such as structure information of protein encode by essential genes,virulence factor information of essential genes,protein-ligand information,metabolic pathway information of essential genes and gene-related drug information.In addition,we compared bacterial essential genes with human gene sequence and provided homology information for users.The information has important reference value during discover of new drug targets.The size of clusters also has important biological significance.The larger the cluster is,the more conservative of contained genes are.By observing the size of clusters,users can directly distinguish that the genes with the function is conserve in multiple species or species specificity.According to the database of clusters of essential gene,we propose a new bacteriological essential gene prediction algorithm based on the cluster of essential genes-K-value.The main principle of the K-value algorithm is to predict the essential genes based on the size of the clusters.Only providing the gene name,users can predict the essential of genes and there is no need of the sequence information.In the new version of CEG_Match,we added new features that users can not only predict the essential of genes by genetic function,but also use gene sequence.Compared with the traditional method of homologous search to predict essential genes,CEG_Match has higher accuracy,lower false positive rate and faster running speed.This solves the disadvantage of predictive algorithms in CEG 1.0 that can only predict essential genes based on gene name.Finally,we statistic the database information,including species,clusters and gene function,etc.,which constructed in this article,and show expectation for the future work.
Keywords/Search Tags:bacterial essential gene, model of cluster of essential gene, CEG_Match
PDF Full Text Request
Related items