Design And Implementation Of A Resource Scheduler Based On Spark-on-EGO

Posted on:2018-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhang

Full Text:PDF

GTID:2348330521451170

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Recent years,with the rapid development of people's living standards and computer technology,it has produced a large amount of data.So how to store and deal with these data has become a hot issue,which gave birth to the Big Data industry.The focus of achieving profitability of Big Data industry is to process these data and get a meaningful conclusion.Such huge data storage and computing work has to depend on Cloud Computing,which provide the powerful computation.Multi-tenant scenario is a popular scenarios in Cloud Computing where users can use the same service instance.So how to manage large-scale cloud computing resources efficiently in this scenarios and to use an effective resource allocation strategy to ensure that each job can obtain the supply of resources and to avoid the job competition between the resources as far as possible and to improve the successful rate of running job when so many tenants use cluster computing services and submit jobs to the cluster,has become the research hot spots.Through the study of large data processing platform Spark?Spark On Yarn and Spark On Mesos,the shortcomings of these three open source architectures have been known.And during the use and research of EGO,the EGO itself has limit in the number of connections between the Driver and Consumer submitted by the user.Once this limit was exceeded,the performance of EGO will drop.So,it is necessary to find a strategy that can not only improve the successful rate of running job and the number of concurrent operations,but also can ensure that the performance of EGO within the normal range.Therefore,based on the above business needs,this paper designs a dynamic resource scheduling strategy named Dynamic Tag and a resource scheduler implementation using this strategy.The user can modify dynamically the resource configuration information through the REST interface provided by the scheduler,and the scheduler configures the configuration to ensure that the jobs submitted by the user can get resources according to the specified configuration.The scheduler is composed by five modules,the resource configuration acquire module,the parsing and calculating module,the Delegator module,the Policy module and Resource Allocator module.Through these modules coordinate,the resource scheduler complete the entire lifecycle of applications and can make the different resource requirements of the tenants in the multi-tenant scenario achieve the corresponding resources to run successfully.In the same time,it can improve the successful rate of running job and the load capacity of the Spark-on-EGO cluster,and make it easier to change the resource configuration items dynamically,without restarting the cluster to make the modified configuration items take effect.At last,through constructing the experimental environment,and designing the experimental procedure,this paper verifies the basic functions of the resource scheduler,including the job registration in the scheduler,the scheduler calculating resources and allocating resources for application,the operation of removing the ended application.Finally,a experiment has been designed to confirm the resource scheduler's basic function.The experimental results show that the resource scheduler realized by the paper can run normally.

Keywords/Search Tags:

Big Data, Cloud Computing, Spark, Resource Scheduler, Multi-tenants Architecture

PDF Full Text Request

Related items

1	Research On Optimization Mechanism Of Containerized Spark Resource Scheduling In Cloud Environment
2	Resource Management System For Big Data Cloud Platform
3	Research On Cluster Analysis Of Biomedical Patent Data In Yunnan Province Based On Spark Cloud Computing Architecture
4	Research And Application Of Data Management And Store For Multi-tenants SaaS Applications
5	Research On Multi-tenants Based Policy-driven Software Defined Networking In Cloud Computing Environment
6	Research On Data Storge Schema And Mapping Mechanism In Multi-Tenants Environment
7	On GPU Resource Management And Scheduling Extension For Spark Platform
8	Research On Large-scale Handwriting Data Analysis Platform Based On Cloud Computing Architecture And Its Application
9	Containerized Cloud Platform-Orientated Design And Implementation Of Resource Scheduler
10	Research On Elastic Data Placement For SAAS Multi-Tenant In Cloud Computing