Font Size: a A A

Research Of High Performance Bit-parallel Longest Common Subsequence Algorithm

Posted on:2019-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:X N WangFull Text:PDF
GTID:2370330545953706Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The LCS problem is the problem of finding the longest subsequence in a data set(usually two sequences)(without changing sequence order).The longest common subsequence problem is a classical problem in many fields,such as computational biology,data Mining,text editing and other fields,as well as data comparison programs and bioinformatics applications,similar to the biological sequence similarity measurement problem.The current solution to solve the LCS problem is based on dynaMIC programming.As the biological sequence databases are growing continuously,parallel sequence comparison algorithms are becoming increasingly important.Based on the bit-parallel LCS algorithm,each character is mapped to a bits of the computer word,which reduces the scale of the matrix,saves the storage space,and operates on 32 characters,reduces the number of operations and achieves a fine granularity parallelism.Although the calculation speed is relatively higher,but with the production of large data,more and more data need to be processed,only using CPU to calculate the speed of data generation,using High-performance Computing Accelerator card acceleration is imminent.The computer cluster based on KNC and KNL accelerated card has a greater advantage,which allows us to achieve a greater degree of parallelism and achieve higher program performance.In this paper,we present XLCS,a new parallel implementation to accelerate the LCS algorithm on Xeon Phi clusters by performing bit-wise operations.We design an asynchronous data transmission and calculation framework,which can cover up most of IO communication,and in a single node,use various optimization techniques,such as quantization,OpenMP,etc.to improve the performance of the algorithm.In the cluster multi node,using MPI to carry on the multithreading communication,the cluster node is divided into the main node and the compute node,the main node is responsible for the data distribution preprocessing,the result collection processing,the computation node is responsible for the computation,the computation completes returns the result to the main node,at the same time the Through the above technical means,make full use of distributed resources,realize the optimization of the LCS algorithm on the cluster.Our performance evaluation shows that XLCS achieves a peak performance of 3.61 TCUPS on a single Xeon Phi 31S1P card and 8.20 TCUPS on a single Xeon Phi 7210 card.Compared to the wellparallelized LCS algorithm OCS implemented on three M2090 cards,XLCS achieves a speedup of 3.6ืon a KNC and 8.1ืon a KNL,respectively.XLCS further achieves 93.84 TCUPS running 16 compute nodes of the Tianhe-2 supercomputer and 312.30 TCUPS with 32 compute nodes of a KNL cluster.To our knowledge,this is the first reported implementation of the bit-parallel LCS algorithm on Xeon Phi clusters.XLCS is availabe at https://github.com/wxiaoning/LongestCommon-Subsequence.
Keywords/Search Tags:Longest Common Subsequence, Bit-Parallel Algorithm, Cluster Computing, Many-core Architectures
PDF Full Text Request
Related items