Font Size: a A A

Identification Of Bacterial Essential Genes And Analysis Of Evolutionary Characteristics

Posted on:2017-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y DengFull Text:PDF
GTID:2180330485985063Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Essential genes play vital roles in bacterial survival and they encode protein that can guarantee the life and reproduction of all class of organisms. These genes have been verified drug targets in microbial systems. In addition, essential genes can foster our understanding of origin and evolution of life. Therefore, identifying the bacterial essential genes has indispensable value in bioinformatics.Among several methods for identifying bacterial essential gene, the experimental method is undoubtedly the most accurate. Despite its accuracy, there are only a few kinds of bacterial essential genes were identified because of its time consumption and high economical, hence, accurate recognition of bacterial essential genes by computational methods becomes necessary. In this paper, essential genes of bacteria as the main object of study, and we use the computational methods to identify essential genes based on composition features of primal sequences. Firstly, the composition features were extracted from the genome sequences based on the annotate documents. Secondly, we evaluated the effectiveness of two machine learning methods, these include support vector machine(SVM) and principal component regression(PCR), in identifying bacterial essential genes. This work would be the first to identify the bacterial essential genes using the PCR. The AUC(Area Under the receiver operating characteristic Curve) for the artificial balance dataset achieved the value of 0.83 when using SVM and 0.87 when using PCR. Then, we improved the two methods by using significance test before the SVM(tt SVM) and adding the kernel before PCR(KPCR). The AUC value of tt SVM is 0.87 and the value of KPCR is 0.84. We performed the four methods on another bacterial, and the highest value achieved 0.95. Thus, the effect of the SVM and tt SVM is better than PCR and KPCR, however, the PCR and KPCR is more stable than SVM and tt SVM. Finally, we build the models through these bacterial that these AUCs are all higher than 0.8, and then construct a free web service, named IBEG(http://cefg.uestc.edu.cn/ibeg/). Using this web service, researchers could not only identify the essential genes, but also could contrast these four different methods.Furthermore, we analyzed the evolutionary conservation of essential genes, genes of the high codon usage and high express gene through the functional genes and horizontal transfer genes. In the aspect of functional genes, the proportion of essential genes are the highest. The more important function of genes, the more evolutionary conservation; In the aspect of horizontal transfer genes, the proportion of essential genes are also the highest, because of the house-keeping genes. Hence, there more essential genes are more easily undergoing horizontal transfer.In summary, in the present study, we used the new computational methods to identify the bacterial essential genes based on the composition features, and increased the new features. We also study the evolutionary theory. However, there are many problems needed be in-depth study.
Keywords/Search Tags:essential genes, composition features, machine learning, evolution
PDF Full Text Request
Related items