Font Size: a A A

Research On Task Scheduling Policy In Bioinformatics Grid Environments

Posted on:2010-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:W C JiangFull Text:PDF
GTID:1118360275986630Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics which is absorbed in creating and developing advanced computational techniques to manage and extract useful information from DNA/RNA/protein sequence is fast emerging as important discipline for life science research. The bioinformatics applications are extremely computationally or data intensive, providing motivation for using Grid technology. However, some functional modules of grid such as resource management, scheduling method, and load balancing policy etc. must be adjusted to accommodate to the bioinformatics applications that are computationally/data intensive, data irrelevant, great task granularity, and cooperative.A novel bi-partite model for resource management is proposed based on the detailed analysis on grid resource management mode. This bi-partite model can let grid users observe grid resources from both low level physical characteristics and high level application characteristics. Based on the bi-partite model, a novel grid service scheduling policy, a load balancing method based on min-cost & max-flow channel, and a dynamic combinatorial optimization method for grid service are separately presented to enhance the global performance of grid platform.Complicated bioinformatics applications usually include multiple sub-tasks which need to interact with each other to coordinately accomplish the whole tasks. A set of optimal services taking account of certain performance constrains are invoked in order to satisfy the complicated tasks. Thus, coordinating the optimal invoking of such services is important to increase responsiveness and to ensure optimal application execution and system usage in general. we present a method called SP2SP, 2-level grid Service Scheduling Policy based on Logical Subnet Partitioning, which tackles the service scheduling problem in Bio-Grid environments in three steps:1) a similarity based logical subnet partitioning algorithm which classifies individual services into different subsets according to similarity constrains that are based on performance metrics; 2) the employment of a requirement based prediction algorithm that maps the bioinformatics applications that are composed of multiple sub-tasks into optimal subnet; and 3) multi-priority queue based service scheduling algorithm used inside individual subnet taking charge of allocating each sub-task to an optimal physical service within the subnet. Based on the sub-grid platform of NPPC, comprehensive experiments are performed in order to evaluate proposed SP2SP mechanism. Results have shown that SP2SP outperforms other scheduling algorithms. In particular, SP2SP performs best for scenarios where a group of tasks has similar resource requirements or need to cooperate with each other to obtain better performance as a whole.To realize load balance among all grid nodes, a bipartite model for load balancing (LB) in grid computing environments, called Transverse viewpoint based Bi-Tier model (TBT), is proposed. TBT can efficiently eliminate topology mismatching between overlay-and physical-networks during the load transfer process. As an implementation of TBT, a novel LB policy called M~2ON (Min-cost and Max-flow Channel based Overlay Network) is presented. In M~2ON, the communication capability is denoted as M~2C (Min-cost and Max-flow Channel) which is obtained using a Labeled Tree Probing (LTP) method. The computing capacity is denoted as the Idle Factor (IF) which is obtained from the semantic overlay. The higher- and lower-level characteristics are combined into an Integrated Impacting Factor (â…¡F) using a Double Linear Inserting (DLI) function. Based onâ…¡F, optimal topology matching can be achieved in the LB process. Extensive experiments and simulations have been performed and will be discussed. The results show that M ON achieves more accurate topology matching with a minimum increment in the overall locating time yet achieving higher system performance as a whole.Based on the theory and research production mentioned above, a bioinformatics grid platform called H-BioGrid is designed and constructed.This platform can integrated any hardware, software, and data resources which come forward to join to this platform. Some bioinfromatics and database developed in our lab are already deployed into H-BioGrid and provide free access to the global bioinformatics researchers.
Keywords/Search Tags:Bioinformatics, Grid, Resource Management, Task Scheduling, Load Balancing, Service Flow, Combinatorial Optimization
PDF Full Text Request
Related items