Font Size: a A A

Research On Value Function Model In Deep Reinforcement Learning

Posted on:2020-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z T XiaFull Text:PDF
GTID:2438330596973188Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning is one of the hotspots of artificial intelligence.Combining the perception ability of deep learning and decision-making ability of reinforcement learning,academics put forward a function model of deep reinforcement learning,which constructs a suitable deep reinforcement learning algorithm,and achieves good results in video game with huge state space or action space.With the success of Deep-Q-Net(DQN),AlphoGo,rainbow and other deep reinforcement learning algorithms,deep reinforcement learning has attracted more attention.There are two obvious problems in the deep reinforcement learning algorithm which combines reinforcement learning with deep learning:(1)Because the output of the deep model is the estimated value,there will be serious over-estimation problems in the combination of reinforcement learning algorithm which uses Max operator for action selection,which will lead to the decline of the agent's ability to find the optimal strategy;(2)Because of the inherent instability of the deep model.As well as the stability of reinforcement learning algorithm itself,deep reinforcement learning algorithm is also likely to be unstable.Aiming at these difficulties,this paper improves the classical deep reinforcement learning algorithm.The main work as follows:(1)The value function model of Deep-Q-Net is improved by using advantage learning.Under the condition of keeping the optimal value unchanged,the non-optimal value is reduced and the difference between the optimal value and the non-optimal value is increased.Finally,even if there are evaluation errors,the Deep-Q-Net can select the optimal action corresponding to the current state.Experiments show that the algorithm based on advantage learning chooses a better strategy and improves the performance of the algorithm.(2)A value function model of Deep-Q-Net based on correction function is proposed.In order to solve the problem of unreasonable reduction of different non-optimal values in advantage learning,a correction function is proposed,which makes the Suboptimal value with larger reduction,and the other non-optimal value with smaller reduction.The final value function model is more reasonable for the reduction of non-optimal value of current state.Experiments show that the Deep-Q-Net based on correction function achieves better results than the Deep-Q-Net based on advantage learning and Deep-Q-Net.(3)Improve the Averaged-DQN.Aiming at the problem that the Averaged-DQN does not use the target value network and the training time is too long,this paper discusses the reasons for the long training time of Averaged-DQN,and analyses the function of the target value network.Then a new value function model of the Averaged-DQN is proposed.Experiments show that the improved Averaged-DQN improves the performance of the Averaged-DQN and reduces the training time.(4)SARSA algorithm is introduced into deep reinforcement learning.Firstly,the reasons for the instability of deep reinforcement learning algorithm are analyzed.Then,the value function model of Deep-Q-Net is improved by using SARSA algorithm,using more safety SARSA algorithm to replace the more radical Q-Learning algorithm in Deep-Q-Net,constructed deep SARSA net.Experiments show that the deep SARSA net improves the stability of the algorithm in some control problems and the performance of the algorithm in some control problems.
Keywords/Search Tags:deep learning, reinforcement learning, advantage learning, DQN, Averaged-DQN
PDF Full Text Request
Related items