Font Size: a A A

Research On Active Exploration Reinforcement Learning Algorithm

Posted on:2021-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:D F ZhaoFull Text:PDF
GTID:1488306569484204Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important branch of machine learning,a computing method that learns from interaction with the surrounding environment.Reinforcement learning focuses on the sequential decision-making of an agent in an unknown environment to achieve goals,and is widely used in natural language processing,robot control and other fields.Different from traditional machine learning algorithms,the reinforcement learning agent obtains samples and implements strategy learning in the process of interaction with the environment.At the same time,the agent needs to spend time and space for the interaction.Especially for reinforcement learning algorithms applied to control systems,too much interaction will change the environment and even damage the agent and the environment.Therefore,high-efficiency reinforcement learning algorithms that can learn environments with minimal interaction costs have important values in applications.In reinforcement learning,two types of efficiency need to be considered: sample efficiency and computational efficiency.The sample efficiency mainly considers the use of samples obtained from real interactions for strategy learning.The sample-efficient reinforcement learning algorithm should interact with the environment as little as possible.The calculation efficiency mainly considers the amount of calculation required to implement the reinforcement learning task.This paper proposes an active exploration reinforcement learning model to improve sample efficiency.Inspired by active learning to improve the accuracy of the classifier by selecting unlabeled examples with large information content,combined with the characteristics of the samples obtained by reinforcement learning,in the reinforcement learning task,the state action space that the agent has not visited and active learning The unlabeled samples in the task are similar.Active exploration of the reinforcement learning model allows the agent to actively select actions with high information content,so that in the reinforcement learning task,the agent can explore samples with higher information content and explore the state action space that the agent is more interested in,thereby effectively improving Explore efficiency and interaction efficiency,accelerate the convergence speed of the algorithm,and achieve efficient reinforcement learning.This paper will actively explore the reinforcement learning model to solve specific problems of reinforcement learning,and further improve the efficiency of the algorithm while ensuring the advantages of the existing algorithm.It specifically includes the following three research contents:(1)An active exploration initial sample collection algorithm is proposed.It is based on the active exploration reinforcement learning model,the initial sample set is collected for the model-based reinforcement learning algorithm.The first step of model-based reinforcement learning is to establish a sample set for training the initial model.The selection of the sample set plays an important role in reducing model uncertainty,improving algorithm convergence speed,and achieving reinforcement learning tasks.The initial sample collection algorithm of active exploration uses Gaussian process to model the dynamic model of the interaction between the agent and the environment,with the goal of maximizing the information entropy of the next state,and optimally select the current interaction action for the agent.Use this optimized strategy to interact with the environment to obtain an optimized initial sample set.Compared with random strategies or deterministic strategies,this optimal sampling method based on information entropy can obtain higher information content in the initial sample set.The dynamic model learned from this sample set can better describe the real situation of the interaction between the agent and the environment.Using this method to select the initial sample set for the modeled reinforcement learning algorithm,it can obtain better algorithm efficiency when solving the nonlinear dynamic system control problem,further improve the sample utilization of the existing reinforcement learning algorithm,and make the agent use more Reinforcement learning tasks are realized with less interaction times.(2)An active exploration model-based reinforcement learning algorithm AEPILCO is proposed.It is based on the active exploration of the reinforcement learning model,the strategy update of the model reinforcement learning algorithm is optimized.The traditional PILCO algorithm is a model-based reinforcement learning algorithm.When the strategy is updated,the objective function is established by minimizing the distance between the current state and the target state.PILCO uses the natural exploration method to solve the problem of exploration and utilization.In the process of strategy learning,PILCO cannot fully consider the impact of strategy updates on model accuracy.When the AEPILCO algorithm establishes the strategy to update the objective function,it introduces an active exploration item to maximize the next state as the goal,so that the agent can optimize the learning strategy.Using optimized strategies to interact with the environment,the generated samples are also conducive to model training.This optimization exploration method based on active exploration of the reinforcement learning model can significantly improve the sample efficiency under the premise that the time cost is not much different from the original algorithm when solving the nonlinear dynamic system control problem,so that the agent can be more Realize reinforcement learning tasks with less interaction.(3)An active exploration model-free reinforcement learning algorithm AEDDPG is proposed.It uses the actor critic structure of DDPG,the actor network modeling strategy,the critic network construction Modular action value function.On this basis,AEDDPG uses an active exploration reinforcement learning model to introduce an active exploration module to assist DDPG in obtaining optimization strategies.The active exploration module uses the Gaussian process to establish a dynamic model,fully considers the impact of strategy learning on the state that the agent may experience in the future,and maximizes the information entropy of the potential state as the goal to indirectly assist the agent's strategy learning.Integrating the objectives of active exploration into the DDPG strategy evaluation can effectively improve the efficiency of exploration.In the experiment,applying AEDDPG to the strategy learning task of solving nonlinear dynamic systems can not only solve the high-dimensional state and action space problems involved in the previous two contents,but also use less interaction to achieve strategy learning compared to natural exploration.,To achieve sample efficiency.Inspired by the ability of active learning to improve the accuracy of the classifier,combined with the characteristics of sample acquisition by reinforcement learning,this paper proposes an active exploration reinforcement learning model and applies it to solve several specific problems of reinforcement learning algorithms,including obtaining model reinforcement learning Algorithm initial sample set,there are model reinforcement learning strategy update and model-free reinforcement learning strategy update.Efficient reinforcement learning algorithms based on active exploration of reinforcement learning models can reduce the cost of agent interaction and improve sample efficiency.
Keywords/Search Tags:Reinforcement learning, active exploration, information entropy, sample efficiency, Gaussian process
PDF Full Text Request
Related items