| Recently,long-running containerized workloads(e.g.,online cloud services,machine learning)are increasingly prevailing in large-scale shared production clusters.As the request loads of such long-running containers typically show time-varying patterns,container co-location can easily lead to resource contention,which creates resource hotspots in the cluster and negatively impacts load performance.Therefore,container scheduling is essential to improving the cluster resource utilization and workload performance.Existing cluster schedulers mainly focus on optimizing either the short-term benefits of cluster load balancing or the initial placement of long-running containers on servers,ignoring the long-term optimization and the serious container migration cost.Such schedulers,however,would inevitably bring a noticeable number of invalid migrations(i.e.,containers are migrated back and forth between two servers),which causes serious service level objective(SLO)violations.In this thesis,we propose Tetris,a model predictive control(MPC)-based container scheduling strategy to judiciously migrate long-running workloads for cluster load balancing.Specifically,we first build a discrete-time dynamic model to formulate a long-term optimization problem of container scheduling.Tetris then solves such an optimization problem using two key components:(1)a container resource predictor,which leverages time-series analysis methods to accurately predict the container resource consumption;(2)an MPC-based container scheduler that jointly optimizes the load balancing and migration cost of long-running containers over a certain and sliding time window.We implement a prototype of Tetris based on K8 s,and evaluate Tetris with prototype experiments on Amazon EC2 and large-scale simulations driven by Alibaba cluster trace v2018.Experiment results show that Tetris can improve the cluster load balancing degree by up to 77.8% without incurring any SLO violations,in comparison to the state-of-the-art container scheduling strategies,yet with acceptable runtime overhead. |