Font Size: a A A

Performance Interference And Resource Allocation Optimization In Co-located Data Centers

Posted on:2024-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2558307103975149Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As cloud computing continues to develop,the increasing demand from users and the growing volume of data require data centers to constantly expand in size.Typically,cloud vendors often divide jobs into online jobs and offline jobs based on their characteristics.Online jobs are usually latency-sensitive while offline jobs can tolerate execution delay.To optimize resource utilization in data centers and reduce operational costs,cloud vendors co-locate online jobs and offline jobs on the same server.Although co-location can effectively improve resource utilization,jobs on the same server may compete for shared resources,causing performance interference,especially for latencysensitive online jobs,which may reduce their Qo S.Therefore,comprehensive analysis of performance interference is required for effective scheduling and resource management to make quick adjustment to ensure Qo S of online jobs.This thesis mainly focuses on establishing a comprehensive online job performance prediction model based on interference,and adjusting resources based on the dynamic load of online job.The main contributions of this thesis are listed as follows:(1)For the co-location of online and offline jobs,this thesis analyzes performance interference and establishes a performance prediction model based on random forest algorithm.Firstly,this thesis selects multiple different offline jobs and clusters them based on their underlying resource indicators.Secondly,offline job combinations are needed in order to apply pressure to online jobs in various resource dimensions,so this thesis constructs a resource indicator prediction model for offline job combinations based on XGBoost(e Xtreme Gradient Boosting).By collecting a small number of samples based on clustering result,this model can help avoid high data collection cost of all the offline job combinations.Then,according to resource indicators of offline job combinations,this thesis uses latin hypercube sampling to construct specific offline job combinations that can apply different levels of pressure to online jobs.Finally,this thesis interferes with online job through offline job combinations mentioned above and proposes a performance prediction model of online job based on random forest algorithm.Experimental results demonstrate that our performance prediction model can capture the relationship between resource indicators and the performance of online job,and the coefficient of determination can reach 0.956,which can provide guidance for workload scheduling.(2)As the load of online job is typically fluctuating,this thesis utilizes a combined load prediction model based on LSTM(Long Short-Term Memory)and exponential smoothing,and proposes a dynamic resource adjustment algorithm based on bayesian optimization to better utilize server resources.Firstly,this thesis analyzes the limitation and isolation technology of various resources,and verifies the replaceability of resources for online job,which provides a basis for more refined resource adjustment.Secondly,this thesis utilizes a combined load prediction model based on LSTM and exponential smoothing to make accurate predictions of future load,with a coefficient of determination up to 0.926.Finally,this thesis implements a dynamic resource adjustment algorithm based on bayesian optimization according to predicted future load.Experimental results illustrate that our dynamic resource adjustment algorithm can better improve the throughput of offline jobs while ensuring the performance of online job compared with static allocation algorithm.In summary,this thesis maintains different offline job combinations that can apply adjustable pressure to online jobs by resource indicator prediction model and latin hypercube sampling,and establishes a comprehensive online job performance prediction model.Additionally,considering the load variability,this thesis uses a combined load prediction model to accurately predict future load and implements a dynamic resource adjustment algorithm based on bayesian optimization,which can improve the throughput of offline jobs while ensuring the performance of online job.
Keywords/Search Tags:data center, performance interference, performance prediction, workload co-location, resource management
PDF Full Text Request
Related items