Research On Agent Decision-making And Control Based On Deep Reinforcement Learning

Posted on:2022-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:D S Zhang

Full Text:PDF

GTID:2518306722488784

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Deep reinforcement learning uses the powerful representation capabilities of deep learning to learn high-level features from state variables directly,avoiding extensive feature engineering,and can successfully handle many control tasks.Although reinforcement learning has surpassed human performance in recent years,it still encounters many problems in dealing with complex areas that need to learn high-level action policy with flexible control.Among the two common problems,one is the low sample utilization rate: although the rapid development of deep reinforcement learning has made the algorithm's performance on the task achieved remarkable results,the low sample utilization rate still restricts the development of this field;The second is that discrete or continuous actions are difficult to deal with the robot control problem similar to the robot playing football: although the most popular algorithms have extremely high performance in dealing with many complex continuous control tasks,their works are focused on discrete or continuous actions.In terms of the algorithms,it is rarely involved in the parameterized action space,and the parameterized action space is good at dealing with robot control problems.This thesis proposes corresponding solutions to the above two problems.The main research can be summarized as follows:(1)On the problem of low sample utilization,the paper tries to combine the twice active sampling method with the Twin Delayed Deep Deterministic Policy Gradient Algorithm.During the experience replay,twice active sampling is introduced,and samples with high returns and large Temporal-Difference(TD)deviations are selected for learning with a higher probability,and other samples have a certain probability to be sampled and learned,thereby ensuring the irrelevance of data and improving the utilization of samples.Finally,the corresponding experiments prove that the algorithm proposed in this thesis is effective in improving sample utilization and improving performance.(2)For some robot control tasks that are difficult to be processed by separate discrete or continuous actions,this paper selects the currently popular algorithm Proximal Policy Optimization Algorithm and extends it to the parameterized action space.Parametric actions combine discrete actions and continuous actions: they include a set of discrete actions,where each action is associated with one or more continuous action-parameters that provide fine-grained control.This not only takes into account the differences of the same kind of actions that are ignored by continuous actions,but also takes into account that discrete actions are difficult to fine-tune to adapt to different situations.Finally,through experiments on the platform domain and goal domain,the effectiveness of the proposed algorithm in improving performance is verified.

Keywords/Search Tags:

Deep Reinforcement Learning, Twice Active Sampling, Twin Delayed Deep Deterministic Policy Gradient, Parameterized Action, Proximal Policy Optimization

PDF Full Text Request

Related items

1	Gait Analysis Of Quadruped Robot Based On Deep Reinforcement Learning
2	Self Learning Control Of Mechanical Arm Based On Reinforcement Learning
3	Exploration Strategy Of Deterministic Policy In Deep Reinforcement Learning
4	Deep Deterministic Policy Gradient Based On Entropy Regularization And Regular Update
5	Optimal Design Of Reconfigurable Intelligent Surfaces Enhanced Multi-User Communication Systems
6	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
7	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control
8	Fast-PPO:Fast-Proximal Policy Optimization
9	Study Of Robot Arm Control Based On Deep Reinforcement Learning
10	Research On Off-policy Reinforcement Learning Algorithm