Font Size: a A A

Biological Big Data Parallelization On Hybrid Heterogeneous Architectures

Posted on:2019-05-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:H D LanFull Text:PDF
GTID:1368330545953581Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The research contents in this dissertation focus on parallel computing methods on multiple underlying high-performance heterogeneous architectures,and primiarily ad-dresses challenges in three aspects:(?)computing methods in limited memory capacity,(?)multi-level algorithmic parallelization based on heterogeneous architectures and(?)ion for different underlying computing hardware.Among the three aspects,hard-ware abstraction features the most significant part,which provides a fine-defined abstract hardware model to separate framework design and specific hardware in respective layers,and therefore simplifies the software framework.It also facilitates fast deployment over a wide variety of devices and flexible memory layout by a concise high-level abstraction.At the meantime,the unified hardware model provides comparable perspective on vari-ous devices to discover key factors that influence implemented performance.Hence,the optimization insights learned on existing methods can be further applied to implement highly efficient kernel functions on other architectures.In the past decade there has been an explosion in the amount of biological sequencedata available due to the rapid advance in the field of high-throughput sequencing tech-nologies.Biologists are keen to analyze and understand this data,since genetic sequences determine biological structure,and thus the function,of proteins.However,increased availability of biological data is not incremental.The data amount is now so great that traditional data analysis approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types.On the other hand,recent fast increasing computing capabilities are delivered by innovative heterogeneous and many-core hardware architectures,including GPUs,Xeon Phis and Sunway processors with domestic technologies.Heterogeneity means the system has to handle multiple different architectures.Each architecture are designed for a specific domain to maximize energy efficient.The many-core trend are due to the bottleneck of lithography,and such that se-quential performance no longer comes free by the promotion of manufacturing.Hence,conventional algorithms face challenges in two terms,the explosive data dimension and the change in architectures.In terms of data dimension,this dissertation proposes an asynchronized computing method,which use the data/task parallel instincts by processing data/task subsets in batches.A processing pipeline is established such that large-scale dataset can be pro-cessed with very few resource usage.The asynchronized method can be nicely scaled in terms of both data as well as computing cluster dimensions.This dissertation presents methods that apply to pairwise sequence alignment and multiple sequence alignment,and managed to process large-scale dataset which other computing tools fail to process.The computing efficiency is even better with larger scale.In terms of heterogeneous computing,this dissertation proposes multi-level op-timization methods on Xeon-Phi and CUDA-enabled heterogeneous architectures and Knights Landing homogeneous architecture.The methods dive deeply into each archi-tecture,and establish a theoretical performance model hereby.The theoretical perfor-mance model leads to the development of highly-efficient optimization methods over a wide variety of dynamic programming algorithms.Specifically,I located and modelled a critical performance bottleneck on Xeon Phi,and address it by reconstructing the computing orders and refactoring data dependency to promote data access locality.The yielding performance is still the highest on this architecture up to now,and approach the device's peak performance.In terms of abstract device modelling,this dissertation proposes a unified program-ming model in the execution perspective that fits in SIMD and SIMT devices,and further divides the processors into latency-oriented and throughput-oriented.I designed a set of C++ class hierarchy by analyzing the commodity and characters based on the abstract device model,and hence minimize the architecture specific part.The common part is highly abstract and optimized,fully uses the asynchronous execution possibilities inside the heterogeneous systems.The model also enables theoretically optimal data layout and a unified access interface.Optimization techniques can now compare with each other to inspire new approaches.With contributions from the three aspects,the methods proposed in this dissertation can support CUDA,KNC,SSE,AVX2,AVX512 instruction sets,and outperforms all the other state-of-the-art works on the respective computing platform.It also has capability to rapidly search a protein database whose size approaches to 40GB,and can potentially scale to larger databases.Hence,the proposed methods successfully address the data-scaling and architectural challenges with which conventional algorithms and software frameworks are struggling.The methods are ready to extend to future algorithms as well as upcoming computing architectures.
Keywords/Search Tags:High Performance Computing, Heterogeneous Computer Architectures, Bioinformatics, Sequence Alignment, Big Data Processing
PDF Full Text Request
Related items