Font Size: a A A

Robot Control With Deep Reinforcement Learning

Posted on:2021-06-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D WangFull Text:PDF
GTID:1488306557491484Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning(RL)is a kind of machine learning method which optimizes the policy by maximizing the long-term profit in its interaction with the environment following the ‘trial and error'mechanism.RL is a very effective method for control and decision problems without the system model.Recently,with the rapid development of deep learning and the improvement of computing power,researchers combine the ability of feature extraction of deep neural networks with the ability of control and decision of RL and propose the deep reinforcement learning(DRL).DRL brings new solutions to the traditional robot control field.The DRL-based control algorithms can directly optimize the control policy while interacting with the robot system,without knowing the dynamic model of the system.By training the deep neural networks following the DRL algorithms,critical information can be extracted from the raw high-dimensional sensing data,and then the information is used for control.This procedure can finally form an end-to-end learning and control method from raw sensing data to control inputs.Moreover,in the multi-robot coordination scenario,the robots could learn from interacting and communicating with other robots following the multi-agent reinforcement learning algorithms,and finally,form cooperative control policies to complete the collaboration task.Although the DRL-based control approaches have the aforementioned superiorities theoretically,many practical issues appear when putting these approaches in the real application.For example,how to guarantee the controller is stable and safe during the training process;how to design an appropriate network structure for a specific sensor data;how to establish the cooperative control polices under limited communication resources,and so on so forth.In this thesis,based on the current DRL algorithms,three basic problems for robot control are investigated: motion control,navigation,and obstacle avoidance,and multi-robot coordination.The results provide both theoretical and technological guidance for robot control with DRL.The contributions of this thesis are summarized as follows:(1)A deterministic policy gradient algorithm with an integral compensator is proposed for motion control of quadrotor unmanned aerial vehicles(UAVs)without known the accurate dynamic models.Considering that the dynamic model of the UAV is under-actuated,non-linear,unstable,and it is hard to establish an accurate dynamic model,the deep deterministic policy gradient algorithm is employed.The deep neural network is used to build a map from the states to the motor inputs.The network parameters are updated following the algorithm with the predesigned reward function.Aiming at the steady-state error problem when using the original algorithm to training the policy,the integral compensation method is introduced.The training algorithm is also improved.With the improved method,a control policy with better precision can be learned.Moreover,a two-phase learning protocol is proposed to improve safety during training.In the offline learning phase,a robust controller can be obtained.Then,the controller is used in the online learning phase,and the parameters are fine-tuned to further improve the performance.Experiments in a high-fidelity quadrotor simulator show that the proposed DRL method can obtain a motion controller with good dynamic performance and strong robustness to various interferences without the accurate dynamic model of the quadrotor.(2)A modular deep reinforcement learning algorithm is proposed to solve the navigation and obstacle avoidance problem for mobile robots.More concretely,the algorithm is used to control a ground mobile robot equipped with 2D LIDAR(laser-ranger finder)to avoid moving obstacles and approach a specified location in a complex unknown environment.Existing obstacle avoidance control and path planning methods are usually carried out under a known environment map,but there are greater difficulties for unknown and dynamic environment problems.First,a modular obstacle avoidance method is designed based on deep Q-learning.In order to deal with the moving obstacles,a novel two-stream Q-network structure is proposed to process the LIDAR data.The motion information of the moving obstacles is merged into the input states so that robots can achieve a more comprehensive observation of the surrounding environment.Next,the local obstacle avoidance policy and global navigation policy are pre-trained separately by reinforcement learning modules.Then two policies are merged through online training with an action scheduling method.Finally,the results in a robotics simulation environment show that the proposed navigation and avoidance policy have obvious advantages over the conventional deep Q-learning method and traditional robotics methods in terms of learning speed and final performances.(3)A multi-robot cooperative control method based on multi-agent reinforcement learning is proposed,which solves the problem of cooperative hunting in multi-robot pursuit and escape games.Based on the single-agent DRL algorithm,a learning-based communication mechanism and a centralized training-distributed execution structure are introduced,which enables the purser robots to learn the cooperative control and communication policies during interacting with other pursuers and the evader.The policies can be developed when the dynamics of the robots and the structures of the environment are all unknown.In order to reduce the communication and computation resources in execution,two simple network topologies for communication are designed: ring topology network and leader-follower line topology network.The training algorithms for each structure are also given.The experimental results show that the proposed method can achieve comparable pursuit performance with the centralized RL method with less communication and computation resources.(4)A control approach and an aerial image processing method for the unmanned surface vehicle(USV)and UAV cooperative maritime reconnaissance missions are proposed.First,a position and angle estimation method are developed with deep convolutional neural network and spatial softmax,which could estimate the position and angle of the UAV and reconnaissance target from aerial images taken by the UAV.Then,the twin delayed deep deterministic policy gradient algorithm is introduced to generated the USV controller.Through the training in a simulator,the learned policy can drive the USV rapidly to approach the target under the interference of sea waves.Based on the UAV controller given in the first part,a safety landing procedure for UAV autonomous landing on USV is developed.The procedure could anticipate the risk during the landing and order the UAV to climb to a safe altitude if necessary.A series of experiments on the UAV-USV cooperative control simulator have validated the effectiveness of the proposed approaches.
Keywords/Search Tags:deep reinforcement learning, multi-agent reinforcement learning, robot control, multi-robot coordination
PDF Full Text Request
Related items