Font Size: a A A

Rearch On Task Modeling And Scheduling Of Gene Sequencing Workflow

Posted on:2017-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:D LvFull Text:PDF
GTID:2310330533966445Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Biological gene sequencing depends greatly on high performance computing systems,in order to make full use of the high performance computing system resources and ensure biological gene sequencing quickly and efficiently executed,the appropriate scheduling policy is very important.Based on the analysis of the log of the high performance computing system of Shenzhen Huada Gene Research Institute(BGI),the paper studies the scheduling strategy for the task of gene sequencing.The paper firstly analyzes the trace log of BGI' high performance computing system,extracts the effective tasks,mining the rules of the host group delivering task,analyzes the task characteristics and attributes,forms the task log of sequencing workflow.According to the above work,the paper analyzes workflow task characteristics and each attribute of a single task,using the appropriate probability distribution to fit workflow task characteristics.Finally,a complete workflow task model is generated and implemented on the simulation system GridSim.The experimental results show that the workflow distribution generated by the task model is consistent with the actual log distribution.Based on the task model and the original FCFS scheduling strategy for BGI computing system,this paper presents a workflow FCFS scheduling strategy that can support biological gene sequencing.At the same time,simple backfill algorithm(easybackfilling)is analyzed and improved by the following three strategies :(1)Prediction on the task running time to improving the accuracy of backfill.The running time is predicted based on weighted running time of user's history data and user's request running time.(2)The setting of the multiple priorities to process the workflow task,so that the backfill strategy can adapt to the workflow scheduling.(3)Improved task selection strategy by adding the resource load and tasks' category matching factor,not only to meet the basic requirements of backfill,but also to take count of the influence of host loads to achieve load balance.The experimental platform is constructed based on GridSim experimental framework,with designs the implementation of FCFS,workflow FCFS,the improved backfilling algorithms.The simulation results show that the proposed improved backfilling scheduling algorithm,can not only schedule the workflow tasks,but also achieve the load balancing effectively.It proves that the improved scheduling strategy can reduce the waiting time of task scheduling effectively.
Keywords/Search Tags:trace log, task modeling, backfilling strategy, load balancing, gene sequencing workflow
PDF Full Text Request
Related items