Font Size: a A A

The Design And Implementation Of Massive Job Submission And Management System In Distributed Computing

Posted on:2017-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:B SuoFull Text:PDF
GTID:2308330488951955Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous extension of the current High Energy Physics (HEP) experiments, the data volume is accumulating rapidly, which forms the great challenge to computing resource. Distributed computing has become the inevitable resource organization form in HEP experiment. In order to meet the needs of experimental data processing and analysis, the resource pattern of HEP transformed from local cluster to diversification, and multiple resource such as cluster, grid, cloud and voluntary community all thrive, while they are heterogeneous, geographically distributed and diverse, which is hard for users to visit multiple back-end resource. On the other hand, the experiment data processing and job analysis often have huge amount of data and time-consuming operation, they need to be split to massive sub-jobs and distribute to extensive computing elements for execution. Besides, the processing of HEP experiments has the common features, which means that different jobs can be handled under a generic framework.To solve the heterogeneity of back-ends in HEP computing model and unified job processing for multiple experiments, this paper designs and develops a common front-end job submission and management system. In this system, it is available for different user groups to split and submit jobs through a unified interface in order to reduce the management pressure of multiple experiments. Meanwhile, it encapsulates the interface of multiple back-end schedule system so as to hide the underlying resource difference and make users have access to heterogeneous resource transparently, lowing use-cost.The main work of paper is as follows:(1) Analyze multiple experiments jobs, like BESⅢ, CEPC, JUNO and abstract the general job processing; summarize system function requirements and design the function modules.(2) Investigate the relevant work, analyze and generalize design feature of some front-end system like Ganga and ILCDirac; take in some advantages and design system core components and basic framework, optimizing job splitter and workflow, etc.(3) Study the features and usage of back-end like DIRAC and HTCondor and design the interface to back-end; provide unified operation UI, so users only need to define a few parameters to visit resource or switch the demand, without attention to complex back-end instructions.(4) Implement job monitoring based on job status feedback function of back-end, store the status information in local database and update information driven by user action.(5) Interact with metadata management system to locate and visit target data quickly and manage the input and output data effectively.This system makes it convenient to manage lifecycle of massive job, including the process of splitting, submission, execution in distributed computing, monitoring job status, rescheduling, register and reading of dataset. It could meet the fundamental demand of job processing; it unifies the processing flow and provides a unified interface for users, which is of great versatility and usability. It designs a generic job submission and management framework that reduces degree of coupling between general components and business data. The application related with experiment could be integrated in system by extension. In such case, it is of clear hierarchy and convenient for secondary development for new experiment.This paper will introduce the system comprehensively from the aspects of analysis for job processing, design of function and structure, comparison of relevant softwares and the support to actual experiment application.
Keywords/Search Tags:High-Energy Physics, distributed computing, massive job processing, front-end, versatility
PDF Full Text Request
Related items