Design And Implementation Of A Parallel Container-Based Protein Structure Alignment System

Posted on:2021-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:C Yao

Full Text:PDF

GTID:2480306575955489

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Protein structure comparisons can provide useful information for identifying functional and evolutionary relationships between proteins.With the dramatic increase in protein structure data in protein databases,computation time is rapidly becoming a bottleneck for large-scale structural comparisons.In order to handle the information-intensive multiple structure alignment(MSTA)task more efficiently,certain schemes are needed to achieve acceleration of biological tools.In the biological field,tools with better comparison results and shorter computation time mainly include m TM-align,Matt,MAMMOTH-mult,MUSTANG,etc.,but most of these tools are in local stand-alone mode and have a lot of room for improvement in running speed.Therefore,the industry has used multi-threaded,GPU,and big data component approaches to implement modifications to protein multi-structure comparison tools.Most of these modifications require designers to make extensive modifications to the original algorithms or introduce a large number of third-party component dependencies.Users may be involved in significant learning and maintenance costs.The container-based parallel protein sequence alignment system is based on m TM-align,which is staged and disassembled according to its different alignment characteristics.For different stages,different acceleration schemes are used so as to maximize the acceleration effect.The acceleration scheme mainly uses the Pipe function of the big data component Spark and the open MP in g++ to ensure minimal changes to the original tools,and most of the resulting environmental dependencies are easily controlled and deployed through a containerized scheme combined with Argo workflow components.The system manages the protein sequence comparison tool and workflow via web side,and uses reverse proxy to access internal web pages.The system deploys and schedules containers based on Kubernetes and automates the deployment of big data components such as HDFS using script and package management tools.The container-based parallel protein sequence comparison system takes advantage of big data and multi-threading,as well as the convenience of containerization.It achieves a significant reduction in protein sequence comparison time,while effectively reducing the time and cost of system deployment and operation and maintenance.It provides an effective solution for bioresearchers to realize protein sequence alignment quickly and conveniently.

Keywords/Search Tags:

Protein structure alignment, Spark, Multi-thread, Container

PDF Full Text Request

Related items

1	Research On Container Health Prediction And Multi-Objective Maintenance Decision Technology Based On Big Data
2	The Parallelization Research Of Genomics Data Comparison Algorithm And The Construction Of Comparison Platform Based On Spark
3	Cloud Container System For Protein Structure Prediction Design And Implementation
4	Protein Structure Alignment Based On Secondary Structure Elements
5	The Method And Application Of Protein Structure Alignment
6	Protein Structure Alignment Methods Based On AFPs
7	A protein structure alignment method and application to the discovery of recurrent protein structure motifs
8	Optimization And Transplantation Of Multi-reference Gene Short Sequence Alignment Tool MUGI
9	Improving VAST structure alignment performance and analysis of small molecule contacts in protein structures
10	Non-sequential Protein Structure Alignment Based On AFPs