| Water power is one of the main clean energy in the world.The aim of hydro power is to make reasonable dispatching decisions so as to improve power generation efficiency.Although uncertain risk factors always accompany the whole process of reservoir dispatching,the traditional optimal scheduling method is less effective on solving the problem of dimension disaster which is caused by uncertain risk decision of power station,especially cascade power station,such as runoff and electricity price.Therefore,it is of great practical significance to propose a new solution method in the risk scheduling decision of cascade power stations.Based on stochastic dynamic programming solution of the dual structure,this thesis puts forward a new cascade reservoir risk decision which is adopted depth neural network and reinforcement learning method,naming Stochastic Dual Deep Neural Networks(SDDNN).The model uses the deep neural network as the power generation benefit evaluation function in the residual period,and then couple the deep neural network and the dual constraint to obtain the optimal decision by recursive optimization.The SDDNN model is divided into three modules: reverse sample data-set generation,neural network and reinforcement learning,and forward decision making.The model structure carries out cascade risk scheduling decision and residual period benefit valuation network training asynchronously in parallel,which can effectively improve the speed and flexibility of model decision making.In this thesis,the sample data-set is generated by reverse recursion and Monte Carlo sampling,and the neural network and reinforcement learning are used to train and upgrade the valuation network,and forward decision is adopted to provide a cascade risk decision scheme.Two optimization methods,parallel and GPU acceleration,are proposed in the view of SDDNN model structure.In this paper,SDDNN model is applied to the cascade risk scheduling decision of the Three Gorges and Gezhouba project.After detailed analysis and precise calculation,the thesis prove that the model can provide a feasible risk decision scheme.The main innovative achievements are as follows.(1)On the basis of the original SDDP algorithm,this thesis proposes a new method that we can use depth valuations neural network to replace the remaining period benefit valuation function.The model makes full use of the characteristic of flexibility and is easy to update network parameters.Based on a large number of simulated runoff,electricity and other factors of random distribution sample data,the model trains depth valuations neural network,realizes online training algorithm,real-time decision-making and improves algorithm performance.(2)The thesis proposes that the risk scheduling decision model structure based on deep learning,which is divided into three modules: sample data set generation,deep neural network training and reinforcement learning,and forward risk decision.Each module can be carried out independently.The sample data generation module provides neural network training samples through reverse iteration.The neural network module training and reinforcement learning provide the benefit evaluation network in the residual period,and the forward decision module provides the risk decision scheme.The model structure is conducive to parallel processing and greatly improves the model solving speed and model flexibility in the decision stage.(3)The model adopts reinforcement learning depth valuations neural network training.Through against a batch of valuation neural network,the reinforcement learning can optimize better one.This thesis puts forward the evaluation standard of convergence speed and stability of valuation neural network which reserves score high in the network.The thesis ensures that the valuation network is optimal in the process of iteration. |