DNA sequence alignment is of great value in biology and other fields.New DNA structures and functions can be explored and discovered through sequence alignment,and the evolutionary links between different species can also be obtained.As the research on biological sequences becomes deeper and deeper,more and more complex DNA sequences are recognized by people,and new sequence structure,functional information,and evolutionary relationships have also been added to sequence comparison models.At the same time,the calculation amount of sequence comparison has also increased,and the current algorithm or processing speed is difficult to meet the efficiency requirements,or a large amount of computing resources have been wasted.Therefore,it is an urgent problem to use the computational resources efficiently and improve the efficiency of sequence matching.Among the algorithms for multiple sequence matching,the de Bruijn graph-based star matching algorithm is widely used.However,it is difficult to perform efficient parallelism while ensuring the quality of sequence matching,and the large amount of intermediate data makes it exceptionally difficult to construct a complete de Bruijn graph due to memory limitation,and the intensive computation involved also makes the sequence matching efficiency low.In this paper,we propose a parallelization of the de Bruijn graph-based star matching algorithm and use a heterogeneous environment composed of CPUs and GPUs to further improve the sequence matching efficiency in response to the problems of high memory demand and low processing efficiency caused by the rapid growth of biological datasets.The main contributions of this paper are two:(1)We analyzed the basic flow of de Bruijn graph-based star matching algorithm and explores the parallelizability.Moreover,we analyzed and proposed a parallelization strategy for constructing de Bruijn graph,removing de Bruijn graph loops,finding central sequences and sequence matching methods based on star matching.Firstly,the problem of load balancing was fully considered to parallelize the construction of de Bruijn graph;secondly,the loop removal algorithm of de Bruijn graph was designed in parallel in order to minimize the influence of communication;thirdly,for the complete np problem of finding the central sequence,the feasible solution threshold was obtained through primary analysis and all feasible solutions were screened in parallel,which greatly reduced the computation;finally,the consideration of Load balancing was performed to parallelize the starratio pairing algorithm.The algorithm(M-Aglin)passed the strong scalability test with 512 processes,among which 128 processes in comparison with CLUSTAL W.Moreover,the algorithm has less running time and higher SPS and CS function scores than CLUSTAL W.(2)We design and implement a heterogeneous port of de Bruijn graph construction on a heterogeneous platform composed of CPU and GPU.Firstly,in order to reduce the resource consumption caused by I/O,the data was compressed and then passed into the core on the device side.Secondly,we redesigned hash table to ensure stable operation for reducing the pressure on memory.Finally,the features in the de Bruijn graph that were fully considered in the process of constructing the graph are optimized in the composition algorithm.In order to improve the efficiency of the algorithm,two optimization strategies were proposed.Firstly,the data can be compressed again by redesigning the coding to further reduce the I/O consumption;secondly,the data transfer and computation were overlapped to achieve the effect of hidden computation,and the pipeline was optimized to improve the efficiency of the algorithm.The heterogeneous algorithm(M-Aglin-MT)has about 30% and 40% improvement in SPS and CS function scores,respectively,compared with the M-Aglin algorithm,and accelerates up to 31 times faster for the de Bruijn graph part and 1.47 times faster for the overall algorithm. |