Font Size: a A A

Predicting Functional Prophages In Bacterial Genomes From High-throughput Sequencing

Posted on:2022-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q NiuFull Text:PDF
GTID:2480306731987689Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The functional prophage can integrate into its host's DNA or can remain as latent episomal DNA,it can play an important role in bacterial virulence acquisition and increase.With the recent release of thousands of bacterial high throughput sequencing data,there has been a growing interest in analyzing the impact of interdependency between the prophage and its host.One of the basic tasks is to predict the functional bacteria and further extract the complete sequence of the prophage in the host bacteria.However,the existing tools are unable to determine functional predictions from the prophage genome.To reduce the cost and relieve the tedious biological experiments for the functional prophage analysis,the computational methods for predicting the functional prophage based on HTS data would be an effective alternation.This paper proposes an algorithm: ProFPh D,the first tool to predict the functional prophage sequences using HTS data automatically and accurately.In this paper,multi-threaded optimization is used to increase the speed of ProFPh D prediction and the sequence extraction,and to construct a functional database by integrating the predicted functional prophage sequence.This paper mainly includes the following three tasks:ProFPh D: HTS-based prediction of prophage and verification of its functional algorithm.The biological induction experiment is to obtain the functional prophage after separation through mitomycin C(chemical inducer).Biological induction experiments require manual execution and judgment.The prophage prediction tool is mainly to construct a phage protein database to annotate the host bacteria's DNA,and regard the clustered region or the approximately one prophage region upstream and downstream of the integrated gene as the prophage region.Therefore,this paper designs and implements a functional prophage prediction algorithm based on HTS technology: ProFPh D.The main idea is based on the "sliding window" principle to find two integration sites att L and att R.And the prophage region can be predicted more accurately.Secondly,based on the improved graphical model,it can search for the strict matching reads that can make the prophage end to end.Finally,the complete the functional prophage gene sequence are extracted based on the end extension algorithm.The obtained sequence are verified by wet laboratory experiment.In the case study,ProFPh D was applied to a set of HTS data in the NCBI database.The 10 functional prophage are predicted from 72 bacterial isolates.Then72 bacterial strains are induced with mitomycin C,and then the 10 functional prophage are deeply sequenced.By comparing with the results of induction experiments,the accuracy rate of the functional prophage predicted by ProFPh D is as high as 90%.And compared with the prophage search tools,the comparison results show that ProFPh D has the same characteristics and performance,and can further verify the functionality of the prophage.Multi-threaded parallelization implementation of ProFPh D.The serialized single-threaded ProFPh D faces the 1000 M sequencing data set and it takes about 3-4hours to complete the prediction of the prophage and verify whether it is functional.Specifically,the running time of ProFPh D is mainly consumed in the three steps of predicting the prophage region,verifying whether the prophage is functional,and extracting the full sequence of the functional prophage based on the end extension algorithm.Therefore,ProFPh D is optimized in parallel in these three steps.The test results show that the parallel speedup ratios of the three parts are as high as 10,7 and2.4 respectively.The overall speedup ratio reaches 5.11.Construction of the functional prophage database.At present,researchers can only obtain the complete sequence of the functional prophage by consulting the references or by annotating the host bacterial genome.However,there are only 1,100 research literatures related to the functional prophage,accounting for 8% of the research literature related to the bacteriophage.In response to these problems,this paper designs a method based on automated extraction of complete sequences and construction of a functional prophage gene sequence database and self-designed scripts to download bacterial sequencing data in batches.At present,this method can download about 1,000 sets of sequencing data on a high-performance server every day,and successfully predict and extract about 30,00 functional prophage gene sequences from about 30 T and about 60,000 sets of bacterial sequencing data.
Keywords/Search Tags:High-throughput sequencing(HTS), Functional prophage, Prediction tool, Parallelization, Database construction
PDF Full Text Request
Related items