| Under the background of Energy Internet,the optimal dispatch of combined heat and power system is of great significance to the energy complementation and economic operation of the system.The optimal operation of the combined heat and power system involves solving nonlinear,non-convex,and multi-objective problems,while traditional methods have difficulties in calculating real-time performance and iterative convergence.Artificial intelligence reinforcement learning technology provides an effective way to solve this problemTherefore,the optimal dispatch mathematical model of the combined heat and power system was constructed firstly.Further the optimization model was transferred to the reinforcement learning framework based on the Markov decision process,and the agent state and action space were divided.Based on the system optimization goals and operation constraints,the agent reward and punishment mechanism were designed based on the system optimization goals and operating constraints,and an reinforcement learning environment suited for the combined heat and power system was set up.Based on the deep deterministic policy gradient algorithm and the soft actor-critic algorithm,the agent network structure was designed and trainedThen considering the industry barriers between different energy systems and the difficulty to conduct complete data interaction,the combined heat and power system was divided into multi-agents according to different stakeholders.A multi-agent actor-critic framework suitable for the combined heat and power system was established based on the multi-agent deep deterministic policy gradient(MADDPG)reinforcement learning algorithm,in which the optimization model was transformed into a multi-agent reinforcement learning model.And the state and action spaces of each agent were determined,a multi-agent reinforcement learning environment was builded and the corresponding reward function was designed.Finally,simulation examples were carried out which show that the proposed methods can effectively solve the problem.Firstly,a 6-node power grid and a 6-node heating network combined heat and power system was established for example verification,which shows that the proposed method can effectively solve the optimization problem of the combined heat and power system.The soft actor-critic method has better computational performance than the deep deterministic policy gradient algorithm.The trained reinforcement learning model can generate optimization strategies in real time which overcomes the problems of long operation time and difficulties in meeting online computing requirements with traditional methods.Compared with the single-agent algorithm,the model training process is easier to converge stably with multi-agent algorithm.In the execution process,each agent only relies on local information to complete the calculation,which solves the data sharing problem of different stakeholders;then a simulation example based on the 33-node grid and 32-node heating network combined system was established which further verifies the universality of the method proposed in this paper. |