| With the explosive growth of Internet users and application types,the data scale faced by the Internet industry is expanding rapidly.The importance of distributed clusters is increasing.After cloud computing technology has been mature,distributed clusters have become almost all enterprises' and the organization's choice in solving data storage and processing problems.Distributed clusters can be used as a platform for multiple jobs,and also need to meet a variety of job's requirement: including batch job and real-time job,different types of job have different metrics for quality of service.A big challenge for distributed systems is to schedule batch jobs and real-time jobs,reducing the overtime ratio of real-time jobs while reducing the completion time of batch jobs to improve the quality of service when resources are tight in the cluster.As an emerging virtualization technology,container technology has brought great convenience to information security and resource utilization.It has been welcomed by major enterprises and organizations.People begin to choose to containerize and use the cluster.The existing scheduling policy in the containerized cluster is coarse in granularity.Usually,it is preferred to directly kill a batch job or reserve the cluster resources for real-time jobs in advance to reduce the overtime ratio of real-time jobs,but this will increase the completion time of the batch jobs and reduce cluster resource utilization.This thesis proposes a scheduling strategy in a containerized cluster,UpPreempt,which can make fine-grained multi-container resource preemption decisions based on container suspension.UpPreempt supports more urgent real-time job when scheduling,and reclaims resources from the containers of running batch jobs.When determining the preemption,it considers the deadline and the resource usage of the preempted containers.The basic idea is to select multiple containers of running batch jobs and preempt a portion of the resources from them and allocates the resources to the real-time job.In this way,UpPreempt does not need to reserve resources for real-time jobs in advance,and it can also make the batch job do not need to be re-executed,thus ensuring the quality of service in real-time job and improving the performance of batch job.Based on Hadoop YARN,this thesis implements a resource preemptive scheduling framework based on container cluster to evaluate its performance by using UpPreempt on a cluster on Aliyun Cloud.The experimental results show that UpPreempt can achieve a good balance between overtime ratio of real-time job and completion time of batch job while improving cluster resource utilization. |