The Design And Implementation Of Massive Job Submission And Management System In Distributed Computing

Posted on:2017-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:B Suo

Full Text:PDF

GTID:2308330488951955

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous extension of the current High Energy Physics (HEP) experiments, the data volume is accumulating rapidly, which forms the great challenge to computing resource. Distributed computing has become the inevitable resource organization form in HEP experiment. In order to meet the needs of experimental data processing and analysis, the resource pattern of HEP transformed from local cluster to diversification, and multiple resource such as cluster, grid, cloud and voluntary community all thrive, while they are heterogeneous, geographically distributed and diverse, which is hard for users to visit multiple back-end resource. On the other hand, the experiment data processing and job analysis often have huge amount of data and time-consuming operation, they need to be split to massive sub-jobs and distribute to extensive computing elements for execution. Besides, the processing of HEP experiments has the common features, which means that different jobs can be handled under a generic framework.To solve the heterogeneity of back-ends in HEP computing model and unified job processing for multiple experiments, this paper designs and develops a common front-end job submission and management system. In this system, it is available for different user groups to split and submit jobs through a unified interface in order to reduce the management pressure of multiple experiments. Meanwhile, it encapsulates the interface of multiple back-end schedule system so as to hide the underlying resource difference and make users have access to heterogeneous resource transparently, lowing use-cost.The main work of paper is as follows:(1) Analyze multiple experiments jobs, like BESⅢ, CEPC, JUNO and abstract the general job processing; summarize system function requirements and design the function modules.(2) Investigate the relevant work, analyze and generalize design feature of some front-end system like Ganga and ILCDirac; take in some advantages and design system core components and basic framework, optimizing job splitter and workflow, etc.(3) Study the features and usage of back-end like DIRAC and HTCondor and design the interface to back-end; provide unified operation UI, so users only need to define a few parameters to visit resource or switch the demand, without attention to complex back-end instructions.(4) Implement job monitoring based on job status feedback function of back-end, store the status information in local database and update information driven by user action.(5) Interact with metadata management system to locate and visit target data quickly and manage the input and output data effectively.This system makes it convenient to manage lifecycle of massive job, including the process of splitting, submission, execution in distributed computing, monitoring job status, rescheduling, register and reading of dataset. It could meet the fundamental demand of job processing; it unifies the processing flow and provides a unified interface for users, which is of great versatility and usability. It designs a generic job submission and management framework that reduces degree of coupling between general components and business data. The application related with experiment could be integrated in system by extension. In such case, it is of clear hierarchy and convenient for secondary development for new experiment.This paper will introduce the system comprehensively from the aspects of analysis for job processing, design of function and structure, comparison of relevant softwares and the support to actual experiment application.

Keywords/Search Tags:

High-Energy Physics, distributed computing, massive job processing, front-end, versatility

PDF Full Text Request

Related items

1	Research On Metadata Management For BESⅢ Distributed Computing
2	Research On Key Technologies Of The MPI-based High Performance Cloud Computing Platform
3	GPU Computing In Massive Data Processing
4	Preliminary Research On Distributed Collaborative High-performance Computing Framework For Spatial Information
5	Research On Distributed Collaborative Processing And Retrieval For Massive Videos
6	Design And Implement Of The Heterogeneous Resource Monitoring System For BES? Distributed Computing
7	Analysis And Optimization Of Massive Data Processing On High Performance Computing Architecture
8	Research And Implementation Of Distributed Twig Query Processing Over Massive XML Documents In The Cloud
9	The Research Of High-energy Physics Particle Classification Using The Neural Network Method
10	Design And Implementation Of A Platform For Massive Log Data Analysis Based On Distributed Computation