| With the development of China’s economy,China’s energy consumption is also growing,among which the total energy consumption of building energy consumption in the country is high and growing,and research on building energy-saving design and energy consumption is very urgent.The energy consumption of air conditioning units accounts for a relatively high proportion of building energy consumption.Therefore,controlling air conditioning units is an effective measure to reduce building energy consumption.This study solves the limitations of the current construction equipment control strategy,and uses reinforcement learning to continuously adjust the control strategy through iteration to obtain the optimal strategy and realize building energy conservation as the center.The existing monitoring data of the air conditioning system is limited,and the deep reinforcement learning algorithm converges requires a large amount of data,which seriously affects the application in practical engineering.In this study,the depth deterministic strategy gradient algorithm is improved by means of bisimulation metrics,sample ordering,etc.,and the demand for data volume is reduced.At the same time,the depth double Q network is used to predict the load of the air conditioning unit to further optimize the control strategy of the air conditioning unit,thereby realizing building energy saving.It mainly includes the following three parts:(1)An enhanced depth determination strategy gradient(E-DDPG)algorithm is proposed for the depth determination strategy gradient algorithm with slow convergence rate and large amount of data.Based on the depth-determined strategy gradient algorithm,the algorithm reconstructs two new sample pools—the diversity sample pool and the high error sample pool.During the execution of the algorithm,the training samples are selected proportionally from the diversity sample pool and the high error sample pool to take into account the sample diversity and sample value information,and improve the utilization efficiency of the sample and the convergence performance of the algorithm.In addition,the rationality of the similarity measure of the sample using the self-simulation metric method is further proved theoretically,and the relationship between the value function and the sample similarity is established.The E-DDPG algorithm and the DDPG algorithm are applied to the classical Pendulum problem and the Mountain Car problem.The experimental results show that E-DDPG can converge with less data.(2)From the perspective of the factors affecting the load of the air conditioning unit and whether the actual data can be collected,the relevant parameters of the air conditioning system load forecasting are established,namely,the outdoor temperature,the outdoor relative humidity,and the air conditioning system load in the first three moments.Due to sudden power failure,card machine and other reasons,the monitoring system is missing data,so the data needs to be preprocessed.The load prediction and influencing factors are used to establish a model of Markov problems.At the same time,in order to avoid overestimation of the intensive learning action value function,the DDQN algorithm is used for load forecasting.Finally,experiments were carried out using the building energy monitoring data recorded by an environmental college.The experimental results show that the load forecasting method based on deep reinforcement learning has higher precision for building load forecasting and can provide guidance for the optimization strategy of air conditioning equipment.(3)Based on the deep reinforcement learning improved algorithm E-DDPG algorithm and air conditioning load forecasting algorithm proposed in this study,the control optimization of air conditioning system is realized.Firstly,based on the existing research and analysis of equipment related to air-conditioning cold source system,the relevant control parameters of optimal control are established,namely chilled water outlet temperature,chilled water pump flow,cooling water inlet temperature,cooling water pump flow,and established according to the working characteristics of the equipment.The constraints of equipment operation,and finally establish the goal of optimal control.Based on the above analysis and constructing the Markov model,the data processing and normalization are carried out for the problems and dimensions of the actual data.The optimal parameters of the correlation control of different load intervals are obtained by E-DDPG algorithm.Finally,the load forecasting algorithm is used to predict the parameters.At the next moment of the air conditioning system,the equipment parameters of the air conditioning system are adjusted to the optimal parameters in real time to achieve energy saving of the air conditioning system. |