Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning

Posted on:2021-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:L B Xing

Full Text:PDF

GTID:2428330611966441

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the rapid development in the field of artificial intelligence in recent years,Reinforcement Learning(RL)have received extensive attention and research as a classic algorithm in the field of artificial intelligence.The sparse reward problem is the main challenge in RL,due to its fitness with the real-life problems,it has become a research hotspot in the field of artificial intelligence.Utilizing RL to solve sparse reward environment has very important practical significance and application value,and can be applied to many fields such as drone control,unmanned driving and robotic arm.Hierarchical reinforcement learning(HRL)is an RL framework mainly used to solve the sparse reward problem.The main work of this paper is to solve the problems that the current HRL algorithms have low exploring efficiency in the environment,lack of generalization ability,and the feature perception module of the environment cannot obtain long-term correlation of features,we propose the following two innovative HRL algorithms for solving sparse reward problems:1.Multi-goal HRL algorithm based on deep network.This method is designed on the basis of the existing Feudal hierarchical structure,and creatively sets multiple subtasks for the execution strategy at each time step,corresponding to multiple subgoals.This way allows the agent to obtain rewards more easily and learn to control the environment from multiple dimensions at the same time,which greatly improves the learning ability and exploration efficiency of the algorithm.2.Feature perception method based on self-attention mechanism and nested LSTM.This method proposes a feature-aware module based on self-attention mechanism and nested LSTM to obtain the long-term correlation between the features of the input image and the features and the long-term correlation between the input images,which improves the algorithm's utilization efficiency of input information.This method has a self-attention module and a nested LSTM module,where the self-attention module is dedicated to finding long-term correlations between convolutional features.Nested LSTM improves the memory of the feature extraction module from the perspective of input data,which improves the exploration efficiency of our HRL algorithm in a sparse reward environment to a certain extent.In order to verify the effectiveness of the two proposed methods for solving the sparse reward tasks,this paper has conducted sufficient experimental verification andcomparison of the above two algorithm.Experiment in the Atari game environment,which is the most representative of reinforcement learning algorithms,and compare with several representative HRL algorithms in recent years.Experimental results show that the above two algorithms have significantly improved the exploration efficiency and generalization performance of the algorithm compared to the benchmark HRL algorithms.

Keywords/Search Tags:

deep reinforcement learning, hierarchical reinforcement learning, nested LSTM, self-attention mode

PDF Full Text Request

Related items

1	Study On Hierarchical Attention Network Model Based On Reinforcement Learning And Text Sentiment Classification
2	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning
3	Research On Chinese News Text Classification Based On Nested LSTM
4	Study On Emergency Escape Route Planning Based On Reinforcement Learning
5	A Research Of Hierarchical Multi-agents Deep Reinforcement Learning For Action Game
6	Supervised Reinforcement Learning:methods And Applications
7	Research On Quantitative Strategy Based On Hierarchical Deep Reinforcement Learning
8	Hierarchical Reinforcement Learning
9	Research On Robotic Visual Control With Deep Reinforcement Learning
10	Research On Reinforcement Learning Based On Clustering Algorithm