| With the development of cloud computing technology,container technology represented by Docker has been quickly recognized and has become the preferred solution with the advantages of lightweight,small resource consumption,and fast startup.Kubernetes stands out with powerful container orchestration capabilities,and has become a de facto standard in the field of container orchestration.However,the current autoscaling solution of Kubernetes cannot ensure the service quality in complex scenarios.This paper analyzes the auto-scaling principle of Kubernetes,which is a responsive scaling for single services.In this way,the response lag of scaling and bottleneck transfer are often caused.To solve the above problems,this paper proposes the following optimization strategies for auto-scaling:(1)A load prediction model based on LSTM network and attention mechanism is proposed.This model considers the influencing factors of various load indicators on the predicted load,mines the time series features of the load data and the correlation between different loads through the convolutional neural network,and then uses the channel attention to weight the extracted features.Finally,this model uses bi-directional LSTM with temporal attention to make the prediction.Experiments on real load data show that the proposed model surpasses traditional algorithms such as LSTM in prediction accuracy,laying the foundation for the optimization of Kubernetes auto-scaling.(2)A scaling method based on deep reinforcement learning is proposed.This method combines the load prediction model of the first part to model the reinforcement learning environment of multi-service application scaling.This model makes scaling decisions with improved DQN,and learns the optimal auto-scaling strategy by interacting with the environment continuously.Finally,model can adjust instances of multiple services simultaneously in one scaling decision cycle.Experiments are conducted on the Kubernetes cluster to analyze the scaling effect and performance.Experiment results show that the proposed method can respond in advance according to changes of traffic,and effectively scale applications as a whole,so as to improve resources utilization with service quality assurance. |