Font Size: a A A

The Design And Implementation Of Data-aware-based Scheduling Sub System In Job Management System

Posted on:2013-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiFull Text:PDF
GTID:2268330392469535Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the appearance of the multi-cluster environment which located in multiple physical locations, and the data access restrictions and lack of data management, This condition leads to a lack of effective management of data sensitive jobs in this environment. This research is to develop a scheduling subsystem which can works with LSF job management system, to manage data resources in multiple cluster environment and support efficient and accurate job scheduling.This paper mainly implements data-aware scheduling, ensuring the system can dispatch the job to the appropriate computing node according to the data job needed and the data information the system knows after job submitted to the cluster. To achieve the data-aware-based scheduling, this paper has also implemented data resources management in the cluster.This paper designs and implements the dataset management module, the storage management module, the cache management module, the configuration management module, the data transmission module, remote data management module and scheduling module. The dataset management module and storage management module implement the management to the dataset and storage separately. The cache management module implements the cache function for the data transferred between the clusters and maintain the life cycle of the cached data. The configuration management module implements information initialization function based on user profile configuration. The data transfer module implements the data transfer between clusters. The remote data management module implements the data information sharing between clusters. The schedule module implements making scheduling decisions according to the data information. Through the interaction between the above modules, as well as the collaboration of the system and job management system, implements data aware scheduling, ensuring the data sensitive jobs can run successfully on the compute host with the system’s management and coordination.Through the collaboration of the system and LSF job management system, users can more focus on the logic of the job without the concern of the distribution of the data, making the entire lifecycle of data sensitive job is smooth through the computing and management inside the system without user’s operation, and provide cluster data and storage information for user. In general, the system implements the scheduling of the data sensitive job, and makes the operation of data transparency, and improves job running efficiency, and provides more complete data management capabilities.
Keywords/Search Tags:multi-cluster environment, data-aware, job scheduling, cache management, data sensitive job
PDF Full Text Request
Related items