Font Size: a A A

Multi-step Reinforcement Learning With Multi Peak Exploration

Posted on:2023-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z X GuFull Text:PDF
GTID:2558306629475334Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The combination of deep learning and reinforcement learning has become a mainstream trend,but its application to the real environment still faces many challenges.It still needs a lot of training data,high-dimensional observation space and accurate agent output port,which all need more powerful search algorithms,the lack of exploration will lead the algorithm to fall into local optimization in the update of multi peak function.However,excessive exploration will also lead to the decline of algorithm performance.The convergence speed of the algorithm is reduced.Therefore,how to balance the relationship between exploration and utilization and accelerate the convergence speed of the algorithm has become a challenge of reinforcement learning algorithm.In order to alleviate the above problems,this paper mainly puts forward the following three improvements:i.An exploratory framework of truncated Lévy flight reinforcement learning based on path adjustment is proposed.Most of the current exploration strategies are based on one-step random walk,and most of the paths in the process of random exploration are repeated.That is,one-step random walk limits the ability of the agent to explore a larger area.In view of this situation,this paper proposes a deep reinforcement learning exploration framework based on multi-step random walk,which uses Levy flight algorithm to expand the randomness of multi-step exploration,so that the agent has greater exploration ability.The exploration framework is compatible with most exploration methods,and integrates the path adjustment strategy in the actual process,so that the agent can avoid the known poor state and fly.The combined reinforcement learning algorithm has achieved better results in both continuous action environment and discrete action environment.ii.A curiosity driven multi-step random walking method is proposed.If the multi-step strategy is adopted in the previously unfamiliar state,the random convergence of the multistep algorithm will be better,that is,the multi-step strategy will still be adopted in the previously unfamiliar state,which will reduce the random convergence of the algorithm.To solve this problem,this paper proposes a curiosity driven multi-step random walking method.In the state of high curiosity,the multi-step walking strategy is adopted,and the network output action is adopted in the area of low curiosity.This method solves the problem of exploration waste in the previous framework.In this paper,the framework is compared with the multi-step random walking algorithm without curiosity drive.Experiments show that the algorithm with curiosity drive gets more rewards in most environments.iii.A recent experience λ-return variance correction algorithm based on Actor-Critic framework is proposed.Most reinforcement learning algorithms use one-step time-series difference method to update the value function.The variance of this method is low,but it needs bootstrapping to update the value function step by step and finally converge.This leads to slow convergence speed of the algorithm and easy to fall into local optimization.To solve this problem,this paper proposes a recent experience λ-return variance correction algorithm,which updates the one-step timing difference between the recent experience return and the global experience,and carries out a variance correction algorithm for the λreturn part to further alleviate the variance problem caused by the multi-step timing difference.Experiments show that the recent experience λ-return variance correction algorithm based on AC framework obtains a higher cumulative reward due to most traditional random algorithms.
Keywords/Search Tags:reinforcement learning, random walk, exploratory enhancement, fast convergence, multi-step reinforcement learning
PDF Full Text Request
Related items