Research On Reinforcement Learning Algorithm Based On Parallel Sampling And Behavior Induction

Posted on:2024-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:K L Zeng

Full Text:PDF

GTID:2568306914972529

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning has achieved considerable success in a variety of fields recently,demonstrating its enormous potential to handle sequential decision problems.The goal of reinforcement learning is to gain experience in new environments and subsequently acquire efficient policies for completing tasks.Because of the coupling effects of elements like the high-dimensional state space of the interactive environment,ambient noise,and sparse rewards,it is challenging for agents to learn optimum strategies from complicated decision-making control tasks.To achieve this,a reinforcement learning method is developed that incorporates parallel sampling and behavior induction with environmental prior knowledge.The specific research content and results are as follows.(1)The hopper experiment based on DeepMind Control Suite is built,and the exploration methods like the Intrinsic Curiosity Module(ICM),Active Pre-Training algorithm(APT),and the state marginal matching(SMM)algorithm are implemented.The benefits and drawbacks of various algorithms are evaluated through analysis of the experimental findings,which offers a theoretical foundation and experimental support for future advancements.(2)Aiming at the problem of low sampling efficiency of state-action pairs in a continuous action environment,a reinforcement learning exploration algorithm(Active Pre-training algorithm for Diverse behavior induction,APD)based on particle entropy and behavior induction is proposed.The state encoder is improved by using the behavior contrast representation loss function,and the high-dimensional state is mapped to the latent space.The particle entropy estimator is introduced to optimize the mutual information objective,and the behavior update mechanism of state sampling is combined to construct a dynamic intrinsic reward,so as to improve the exploration capacity of the unknown environment.The results demonstrate that the exploration efficiency is significantly increased and that the cumulative reward of this method is at least 28%higher than that of previous algorithms in the Jaco Arm environment.(3)Focusing on the issue of low robustness caused by unstable behavior label updating mechanisms during the fine-tuning of downstream tasks,an exploration optimization algorithm based on a variational autoencoder(VAE)and parallel sampling is proposed.A behavior update method is designed using parallel sampling and timing peak detection,which can comprehensively determine the behavior labels of unknown exploration areas.To increase the effectiveness of behavior policies,the VAE encoder is adopted to probabilistically encode action intents,and enhance the particle entropy-based behavior induction model.According to the experimental results,the performance of the proposed algorithm has further improved and its cumulative reward is at least 33%higher than that of other algorithms used in the Jaco Arm environment.

Keywords/Search Tags:

reinforcement learning, exploration strategy, intrinsic reward, mutual information, particle entropy

PDF Full Text Request

Related items

1	Research On Exploration Strategy In Deep Reinforcement Learning
2	Towards Design Of Intrinsic Rewards For Sparse Reward Problem
3	Research On Deep Reinforcement Learning Algorithm Based On The Combination Of Intrinsic Reward And Auxiliary Tasks
4	Research On Intrinsic Rewards For Reinforcement Learnin
5	The Analysis And Research Of Exploration Strategies And Algorithms In Reinforcement Learning
6	Researches On Efficient Exploration Driven By Reward Function
7	Research On Reinforcement Strategy For Reward Sparseness Proble
8	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
9	Research On Exploration-Incentivized Robust Deep Reinforcement Learning
10	Research On Active Exploration Reinforcement Learning Algorithm