| With the establishment of the Energy Internet,the coupled operation of multiple energy systems has become an inevitable development trend of energy systems.While improving system operation stability and renewable energy consumption rate,multi-energy coupled operation has brought complex operation modes and massive amounts of control.Traditional algorithms have been unable to meet the optimal scheduling requirements of the Energy Internet.Deep reinforcement learning algorithm is based on neural networks,and combined with the mechanism of reinforcement learning algorithms.It is an algorithm that solves continuous control problems.It has good application effects in scenarios where the number of samples is small and the control action constraints are many.It provides a new direction for the research of control strategy in energy systemThis paper analyzes the architecture and operation mode of the Energy Internet.Based on the power system model and the natural gas system model,combined with the coupling equipment,the Energy Internet dispatching model is constructed.Considering that the energy Internet promotes the large-scale access of renewable energy,this paper adds a volatility model of load and renewable energy output to the dispatch model.On the basis of this scheduling model,combined with the deep reinforcement learning algorithm to achieve optimal scheduling of the Energy Internet,the scheduling effect is evaluated by the convergence result and the convergence speed.Aiming at the convergence effect of deep reinforcement learning algorithms,this paper proposes a deep reinforcement learning algorithm based on soft actor-critic.By introducing the concept of maximum entropy in the deep deterministic policy gradient algorithm,the agent’s action space exploration effect is improved,to avoid falling into the local optimum during the training process.In addition,this paper introduces the concepts of environmental state and controllable state,which improves the applicability of reinforcement learning algorithms in Energy Internet scheduling problems.The results of the analysis of the calculation examples show that the use of random action strategies avoids the problems of poor convergence results and the actions converging to the boundary,and improves the application effect of the reinforcement learning algorithm in Energy Internet scheduling.Aiming at the convergence speed of the deep reinforcement learning algorithm,this paper redesigned the network structure of the evaluator based on the framework of the soft actor-critic algorithm and combined with DenseNet to improve the efficiency of neural network training without affecting the convergence effect.The results of the analysis of the calculation examples show that by introducing the attention mechanism,the convergence speed of the neural network in the initial stage of training is significantly improved,and the application effect of the deep reinforcement learning algorithm in complex scenarios is improved. |