Font Size: a A A

Research On The Design Of Agent-based Decision Model For Games Based On Reinforcement Learning

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2370330623968600Subject:Engineering
Abstract/Summary:PDF Full Text Request
At present,most of the researches are conducted by using the value-based Q function such as DQN reinforcement learning algorithm,which reduces the research on the more intuitive reinforcement learning algorithm based on the strategy method.Moreover,in the field of games,the distribution of continuous high-dimensional state actions is a huge difficulty faced by the application of reinforcement learning in decision problems.To solve this problem,in this paper,the basic method strategy search method--deterministic strategy gradient algorithm is studied,the advantages and disadvantages of deterministic strategy gradient algorithm are analyzed,and its defects are improved.An improved model of double-shear strategy gradient algorithm is proposed,and the influence of different improved parts on experimental results is discussed.Finally,four consecutive high-dimensional tasks were selected for training on the game platform to prove the performance improvement level of the improved algorithm in solving this problem.This article mainly carries on the elaboration of five parts.(1)This paper first briefly introduces the essence of reinforcement learning and its development application field,then introduces the basic method--deep learning,expounds its development history and current situation,and finally gives a brief introduction to the development of DRL.(2)The second chapter analyzes the mathematical model of reinforcement learning architecture--markov decision process,and belman optimal solutions,and raises the reinforcement learning method,the basis of value iteration method and policy iteration method,and then analysis environment without model based on value iteration method and iterative method of two kinds of reinforcement learning strategies solving method,the monte carlo method and the temporal difference method.(3)In chapter 3,based on the strategy iteration and temporal difference method in the previous chapter,the deterministic strategy gradient algorithm of the basic method that needs to be improved is proposed.This paper analyzes the error caused by Q estimation network and the cumulative error of update,and puts forward three improvement measures: double shear Q learning,target network and delay strategy update,and target strategy smooth regularization.(4)The fourth chapter uses the game in GYM interface MuJoCo as the environment platform.And use the same environment and network structure to compare the performance of the algorithm with that of the same strategy iteration,and carries out a series of ablation experiments on different parts of the improved algorithm,compares the performance of the algorithm,and finally discusses the influence of the experimental results.(5)The fifth chapter summarizes the content of this paper,and further expounds the unresolved problems of deterministic strategy gradient algorithm,and puts forward the prospect of the improvement and application of this algorithm in the future.
Keywords/Search Tags:Deep reinforcement learning, Strategy gradient, Double Clipped Network, The game intelligence
PDF Full Text Request
Related items