Font Size: a A A

Towards A Better Detection Of Horizontally Transferred Genes By Combining Unusual Properties Effectively

Posted on:2014-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:D P XiongFull Text:PDF
GTID:2250330401490199Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Horizontal gene transfer (HGT, also called lateral gene transfer) is a transfer ofgenetic material from one lineage to another other than offspring and has played a keyrole in species evolution and microbial genome diversification. Transfers can occurboth between closely and distantly related species or strains, and are thought to befrequent events. Amongst the single-celled organisms, perhaps HGT is the dominantform of genetic transfer. In addition, HGT has also been proposed to result in theemergence of novel human diseases and poses several risks to humans. As sequencedata has accumulated, evidence for rampant HGT has increased dramatically. Thus,detecting HGT has enormous practical significance for providing a betterunderstanding of the impact of HGT on genome evolution and for identifying newdrug targets.To date, a number of computational methods for horizontally transferred genesfinding have been proposed in the past decades, however none of them has provided areliable detector yet. At present, there are two primary strategies to detect horizontallytransferred genes: phylogenetic approaches and parametric approaches, butphylogenetic approaches are time-consuming and insufficiently robust. In existingparametric approaches, only one single compositional property can participate in thedetection process, or the results obtained through each single property were justsimply combined. It’s known that different properties may mean different information,so the single property can’t sufficiently contain the information encoded by genesequences. In addition, the class imbalance problem in the datasets which also resultsin great errors for the gene detection hasn’t been considered by the published methodsthat based on machine learning.In light of all the caveats, in this study, we have developed a new strategy(Hgtident) which used support vector machine (SVM) to detect horizontallytransferred genes by combining the unusual properties effectively, and improveddetection accuracy effectively. Hgtident includes the introduction of morerepresentative datasets, optimization of SVM model, feature selection based ongenetic algorithm (GA), handling of imbalance problem in the datasets and extensiveperformance evaluation via systematic cross-validation methods. Through featureselection, we found that JS-DN and JS-CB have higher discriminating power for HGT detection, while GC1-GC3and K-mer (1≤K≤7) make the least contribution. Extensiveexperiments indicated the new classifier could improve Recall by a certain level, andalso reduce Mean error dramatically. For the testing genomes, compared with theexisting popular multiple-threshold approach, our Recall and Mean error wasrespectively improved by2.81%and reduced by26.32%in average, which not onlymeans that numerous false positives were identified correctly, but that our viewpointis effective and reliable.The approach Hgtident used in this paper is the first use of such integratedstrategy to identify horizontally transferred genes. Hgtident introduced here is aneffective approach for better detecting HGT. Extensive experiments demonstrated thatcombining multiple features of HGT is essential for a wider range of HGT eventsdetection.
Keywords/Search Tags:Horizontal gene transfer, computational methods, combining theunusual properties, support vector machine (SVM), genetic algorithm (GA)
PDF Full Text Request
Related items