| With the growth of social electricity demand,the change of energy industrial structure and the continuous update of production capacity technology,energy dispatch becomes very important in the energy management of microgrid.The idea of traditional method is to formulate the energy dispatch as a pure optimization problem,which fails to consider the dynamic and uncertainty of the model.In contrast,reinforcement learning takes the dynamic changes of the model over time into account.Moreover,it can deal with the uncertainty in the decision-making process.This thesis focuses on the application of reinforcement learning to solve the decision-making problem in microgrid energy dispatch task.The main research contents are as follows:Firstly,based on the basic theory of reinforcement learning and Markov Decision Process(MDP),this thesis constructs the microgrid energy system model,and models the diesel generator and battery device,respectively.At the same time,the optimization objectives of minimizing the cost of energy generation and ensuring the balance of energy supply and demand are established.Based on the model,this thesis proposes an FH-DDPG algorithm for finite fixed timestep,and applies multi-Actor and reverse induction mechanism innovatively to solve the problems of poor training stability and slow convergence speed brought by DDPG algorithm.The simulation results has shown that FH-DDPG algorithm realizes the minimization of energy generation cost and the balance of supply and demand.Based on the above research,considering the uncertainty caused by incomplete information and lack of information timeliness in the actual scene,this thesis constructs Partially Observable Markov Decision Process(POMDP)for the microgrid energy system based on MDP,which overcomes the difficulties such as the complexity of historical information sampling process and the difficulty of obtaining the historical information of the initial unit.Next,the observation space and history space are defined,and the differences between historical information and state information are analyzed.Finally,an FH-RDPG algorithm is proposed,which improves the utilization efficiency of historical information through the reasonable design of the network.Simulation results show that the FH-RDPG algorithm achieves better convergence and model generalization ability in energy dispatch tasks when training and testing on the same time period and multiple different historical periods. |