Font Size: a A A

Research On Parallel Algorithm Of Multiple DNA Sequence Alignment Based On De Bruijn Graph

Posted on:2011-03-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:1488303380981989Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, Multiple Sequence Alignment is an important topic in Biology information industry, which has wide range of applications in area of Gene Identification, Structure Prediction, etc. However, satisfying algorithm for Multiple Sequence Alignment are still not available, due to the inherent complexity of the problem; meanwhile, with increasing quantity of biological data, Serial Algorithm is no longer able to meet the calculation demands. This research is focusing on how to apply the de Brujin graph in Multiple Sequence Alignment as well as parallel processing program. We also propose a new Multiple Sequence Alignment algorithm, namely PL_GAlign. The main work and contributions of the research are summarized as follows:Introduce the distance parameter d to algorithm of multiple sequence alignment based on de Bruijn graph and adopt the improved star-Align algorithm. Firstly, detailed analysis is conducted for the Multiple Sequence Alignment algorithms that are widely utilized in these days. However, the usual Parallel division strategy does not perform very well for these algorithms. Therefore, the key of the analysis is exploring and improving the Multiple Sequence Alignment algorithm based on graph, as well as make necessary refinements. To better take Genetic variability into account, we introduce the distance parameter d and replacing the current precise matches with vague match that allowed certain errors. After we obtained center series by applying de Bruijn graph, instead of applying dynamic programming algorithm, we adopt the improved star-Align algorithm; which are more suitable for this situation and can successfully reduce the complexity of the problem almost to linearity.Secondly, discuss the strategies for parallel processing in each stage. We study the parallel processing based on de Bruijn graph, in order to deal with high computational complexity of Multiple Sequence Alignment. We discuss the serial processing and possibility of parallel processing in the stage of building de Bruijn graph, removing circles?finding maximum weighted path and pair-wise alignment,and the strategies for parallel processing in each stage are also proposed in the research.Thirdly, we did some tests with a series of data and the experiment shows that the processing speeds of PL_GAlign algorithm are much higher then the current iterative algorithm, especially in the case with longer input sequence and more data. In terms of precision, PL_GAlign algorithm is slightly better than widely used CLUSTAL W algorithm.
Keywords/Search Tags:Multiple Sequence Alignment, de Bruijn graph, star-Align, maximum weighted path
PDF Full Text Request
Related items