Font Size: a A A

Design And Implementation Of A Parallel Container-Based Protein Structure Alignment System

Posted on:2021-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:C YaoFull Text:PDF
GTID:2480306575955489Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein structure comparisons can provide useful information for identifying functional and evolutionary relationships between proteins.With the dramatic increase in protein structure data in protein databases,computation time is rapidly becoming a bottleneck for large-scale structural comparisons.In order to handle the information-intensive multiple structure alignment(MSTA)task more efficiently,certain schemes are needed to achieve acceleration of biological tools.In the biological field,tools with better comparison results and shorter computation time mainly include m TM-align,Matt,MAMMOTH-mult,MUSTANG,etc.,but most of these tools are in local stand-alone mode and have a lot of room for improvement in running speed.Therefore,the industry has used multi-threaded,GPU,and big data component approaches to implement modifications to protein multi-structure comparison tools.Most of these modifications require designers to make extensive modifications to the original algorithms or introduce a large number of third-party component dependencies.Users may be involved in significant learning and maintenance costs.The container-based parallel protein sequence alignment system is based on m TM-align,which is staged and disassembled according to its different alignment characteristics.For different stages,different acceleration schemes are used so as to maximize the acceleration effect.The acceleration scheme mainly uses the Pipe function of the big data component Spark and the open MP in g++ to ensure minimal changes to the original tools,and most of the resulting environmental dependencies are easily controlled and deployed through a containerized scheme combined with Argo workflow components.The system manages the protein sequence comparison tool and workflow via web side,and uses reverse proxy to access internal web pages.The system deploys and schedules containers based on Kubernetes and automates the deployment of big data components such as HDFS using script and package management tools.The container-based parallel protein sequence comparison system takes advantage of big data and multi-threading,as well as the convenience of containerization.It achieves a significant reduction in protein sequence comparison time,while effectively reducing the time and cost of system deployment and operation and maintenance.It provides an effective solution for bioresearchers to realize protein sequence alignment quickly and conveniently.
Keywords/Search Tags:Protein structure alignment, Spark, Multi-thread, Container
PDF Full Text Request
Related items