Font Size: a A A

Highly-efficient Robot Self-learning With Deep Reinforcement Learning

Posted on:2021-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LinFull Text:PDF
GTID:2518306470961409Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning(Deep RL)is a novel and promising mathematical framework in the domain of Artificial Intelligence,which leverages the decision making ability of Reinforcement Learning and the generalizability of Deep Learning.Deep RL can help an agent learn a control policy in a end-to-end training manner,achieving direct control only based on high-dimensional raw samples.This technique,which can be viewed as a key to intelligent robot manipulation,can be further leveraged for adaptive robot control only from input sensing data,endowing the robot with a more simplified programming process and powerful generalization ability.Though there are some breakthroughs in Deep-RL-based robot learning,high sample complexity,low learning efficiency,and long training time required for satisfied control policy hinder the progress of Deep RL in physical robot application.Based on previous works,this dissertation studies on improving efficiency of robot learning with deep reinforcement learning.Regarding challenges met in sparse-reward-guided multi-goals manipulation task under high-dimensional continuous state and action space,such as large time consumption in training and hard to converge,this dissertation proposed some algorithms and solutions in order to improve the training efficiency and robustness of robot learning.The main work includes:1.A mathematical framework called Invariant Transform Experience Replay(ITER)is proposed to reduce the sample complexity of robot learning.The main idea of ITER is to augment the observed transition samples without changing any dynamical pattern of them,and therefore it can generate more feasible transition samples,leading to the improvement of robustness,training efficiency and policy generalization of robot learning.2.Based on ITER framework,two algorithms are proposed.The first one is called Kaleidoscope Experience Relay(KER).Its core is to use space invariant transform mapping,including symmetry and rotation,to generate additional invariant transition samples increasing the training dataset for dozens of times for experience replay so as to hugely improve robot learning efficiency.The second one is called Goal-augmented Experience Replay(GER),which leverages the lax definition of reward function for the success of anytask: any hindsight goal can be instead augmented by a random goal sampled from within a small ball centered around that goal,in which also achieves success.3.An experimental simulation platform is built and leveraged for validating the proposed algorithms with three basic robotic manipulation tasks(pushing,sliding,pick-and-place).The discussion of the experimental observation is given,regarding the improvement of learning efficiency and robustness with the comparison of a traditional algorithm.The results show that the proposed method significantly increases in learning rates and success rates,attaining a 13,3,and 5 times speedup in the pushing,sliding,and pick-and-place tasks respectively.What's more,our methods also perform well in tasks with obstacles4.In terms of an observed pathological phenomenon in the training process of robot self-learning,several hypotheses and experiments were proposed and their relative theoretical analyses are discussed,including: 1)a mismatch between the samples state distribution induced by the KER and that of current control policy,2)overfitting policy towards the distribution of data-augmented training dataset,3)instability of DDPG algorithm.And this dissertation theoretically and experimentally proves that is the DDPG algorithm,instead of proposed algorithms,leads to the pathological phenomenon in the learning process,which further verifies the robustness and generalization ability of the proposed algorithms.5.A control system for the Baxter robot in the real-world is implemented with ROS,whose control strategy is the ITER-based well-trained deep RL policy,and a "pick-and-place" experiment is conducted and the experimental result shows the proposed method is effective and reliable.
Keywords/Search Tags:Robot learning, Deep Reinforcement Learning, Data Augmentation, Invariant Transform Experience Replay
PDF Full Text Request
Related items