| Currently,China’s carbon emissions related to building operation energy consumption account for about 22% of the national total.The development of renewable energy technologies,including photovoltaic,wind power,and other renewable energy,is an essential to meet the growing demand for building energy consumption and to achieve the two carbon goals.With the maturity of renewable energy technologies and the decrease of installation costs,renewable energy systems’ total installed capacity,such as photovoltaic and wind power,has been rising.More and more users become prosumers who produce and consumption energy simultaneously.Promoting the local consumption of distributed renewable energy is significant to reduce the carbon emissions of building operations.Renewable energy generation is intermittent and unstable and the user load is time-varying,which causes the supply and demand mismatch between renewable energy generation and user load.It greatly limits the local consumption capacity of distributed renewable energy.When the renewable energy building "energy","load","storage",and other links can achieve fine management and real-time control,it is easy to achieve mutual coordination of building energy in different periods to increase the level of building renewable energy consumption.Besides,it also benefits user energy system supply and demand balance and maintenance of grid stability.Compared with other algorithms,data-driven reinforcement learning algorithms are popular in the field of energy system regulation because of their strong adaptive characteristics and low requirements for model accuracy.This paper focuses on the characteristics of optimization methods for renewable energy building energy systems under real-time tariffs based on different reinforcement learning algorithms to obtain more effective and attractive regulation strategies.This paper measured 27696 pieces of photovoltaic power generation and electrical load real-time data of a zero energy house(ZEH),including 17520 pieces of data for model training and 10176 pieces of data for model testing.The optimization aims to minimize energy costs and develop an optimal scheduling model for renewable energy building energy systems in a Python simulation environment.The Q-learning and deep Q-network(DQN)algorithms based on discrete action control and the deep deterministic policy gradient(DDPG)algorithm based on continuous action control are used to built the models,respectively.Then the testing results based on the three reinforcement learning are compared with the results of the rule-based operation of renewable energy systems.In this paper,parameters such as self-consumption ratio,feed-in ratio,and annual energy cost are introduced to quantitatively and objectively evaluate the operation effects of three reinforcement learning algorithms,considering model convergence characteristics,supply-demand balance deployment effect,impact on local PV consumption,and economic benefits,respectively.The main contents and conclusions of this work are summarized as follows.Firstly,the convergence properties of the reinforcement learning algorithms are verified for ZEH operation by hyperparameter conditioning.When the discount factor is between 0.1 and 0.5,the reinforcement learning model can achieve better convergence performance.Secondly,based on the supply and demand balance effects of of energy systems on typical days and weeks in winter and summer,it is found that the regulation of the three reinforcement learning algorithms shows the rules of discharging when electricity prices are high and charging when electricity prices are low under real-time electricity prices.The DDPG agent achieves the best learning effect and can performs well considering the surrounding environment changes in time with larger charging or discharging actions,and effectively smooths out the grid’s peak demand.DQN and Q-learning agents have similar learning effects.However,the Q-learning algorithm can achieve more flexible dispatch with real-time tariffs fluctuations.Thirdly,the model based on the DDPG algorithm achieves the highest PV selfconsumption rate of 49.4% and energy self-sufficiency rate of 36.7% in terms of enhancing local PV consumption.The PV self-consumption based on the Q-learning algorithm is 47.4% which is better than self-consumption ratio 46.7% of the DQN algorithm.This zero-energy house still has a large potential to improve the local energy consumption of renewable energy.Finally,the energy operation of a ZEH based on the three reinforcement learning models achieve better system economic dispatching results.The model based on the DDPG algorithm achieves the highest economic efficiency and higher battery utilization.The regulation based on the DQN algorithm saves 1.1% more energy operating costs than that based on the Q-learning algorithm.It is worth noting that the DDPG-based system operation achieves more than 45.5% of the typical summer monthly energy cost compared to the rule-based results. |