Font Size: a A A

Research On Agent Decision-making And Control Based On Deep Reinforcement Learning

Posted on:2022-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:D S ZhangFull Text:PDF
GTID:2518306722488784Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning uses the powerful representation capabilities of deep learning to learn high-level features from state variables directly,avoiding extensive feature engineering,and can successfully handle many control tasks.Although reinforcement learning has surpassed human performance in recent years,it still encounters many problems in dealing with complex areas that need to learn high-level action policy with flexible control.Among the two common problems,one is the low sample utilization rate: although the rapid development of deep reinforcement learning has made the algorithm's performance on the task achieved remarkable results,the low sample utilization rate still restricts the development of this field;The second is that discrete or continuous actions are difficult to deal with the robot control problem similar to the robot playing football: although the most popular algorithms have extremely high performance in dealing with many complex continuous control tasks,their works are focused on discrete or continuous actions.In terms of the algorithms,it is rarely involved in the parameterized action space,and the parameterized action space is good at dealing with robot control problems.This thesis proposes corresponding solutions to the above two problems.The main research can be summarized as follows:(1)On the problem of low sample utilization,the paper tries to combine the twice active sampling method with the Twin Delayed Deep Deterministic Policy Gradient Algorithm.During the experience replay,twice active sampling is introduced,and samples with high returns and large Temporal-Difference(TD)deviations are selected for learning with a higher probability,and other samples have a certain probability to be sampled and learned,thereby ensuring the irrelevance of data and improving the utilization of samples.Finally,the corresponding experiments prove that the algorithm proposed in this thesis is effective in improving sample utilization and improving performance.(2)For some robot control tasks that are difficult to be processed by separate discrete or continuous actions,this paper selects the currently popular algorithm Proximal Policy Optimization Algorithm and extends it to the parameterized action space.Parametric actions combine discrete actions and continuous actions: they include a set of discrete actions,where each action is associated with one or more continuous action-parameters that provide fine-grained control.This not only takes into account the differences of the same kind of actions that are ignored by continuous actions,but also takes into account that discrete actions are difficult to fine-tune to adapt to different situations.Finally,through experiments on the platform domain and goal domain,the effectiveness of the proposed algorithm in improving performance is verified.
Keywords/Search Tags:Deep Reinforcement Learning, Twice Active Sampling, Twin Delayed Deep Deterministic Policy Gradient, Parameterized Action, Proximal Policy Optimization
PDF Full Text Request
Related items