Font Size: a A A

Research On Path Planning Of Warehouse Handling Robot Based On Deep Reinforcement Learning

Posted on:2022-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:C T RongFull Text:PDF
GTID:2481306329952779Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the continuous expansion of oilfield development and production scale,the task of ensuring the supply of materials has further increased,and the research on oilfield storage and handling robots has attracted more and more attention from researchers.The task of the oilfield storage and handling robot is to replace the workers to issue and store downhole tools in the storage room.Since the environment of the storage room is constantly changing with the issuance and storage of tools,the robot needs to effectively avoid dynamic obstacles and static obstacles.Therefore,the robot is required to effectively perform path planning tasks in an unknown dynamic environment.The Deep Reinforcement Learning(DRL)algorithm does not rely on any prior knowledge to select the optimal action through interaction with the environment.In this paper,the DRL algorithm is combined with the path planning task of the oilfield storage and handling robot,and the effectiveness of the DRL algorithm is verified in the Gazebo simulation environment.The main research contents are as follows:First of all,in view of the defect of the deep deterministic policy gradient(DDPG)algorithm that is prone to local optimality,this paper introduces a pair of critic networks and selects the minimum of the Q values generated in the two critic networks as the target value for updating the actor network parameters.The overestimation bias generated during network training is reduced,and the optimal strategy is solved.Secondly,in view of the slow convergence speed of the DDPG algorithm,this paper improves the priority of samples in the experience playback mechanism.The sum of the two TD-errors generated by the introduced two critic networks and the immediate reward of the sample is used as the priority of the sample.The immediate reward value of the sample and the absolute value of TD-error have a linear relationship with the sample importance.This improved method can fully take into account the influence of the two on the sample importance when sampling,in order to achieve the purpose of accelerating the algorithm convergence speed.The method proposed in this paper is tested in multiple experimental environments and multiple comparison algorithms on the Open AI Gym platform to verify the effectiveness of the improved algorithm.Finally,the improved method proposed in this paper is used in the application of oilfield storage and handling robot path planning tasks to verify the effectiveness of the algorithm.First,the robot path planning task is described,and the solution is proposed,including the setting of the state space and action space of path planning.Second,the robot model is established,each node of the robot is built,the path planning environment model is established,the dynamic and static obstacles are set,the network model of the algorithm and the reward function are designed.Third,the improved algorithm is combined with path planning tasks to conduct simulation experiments,and the results of the comparison experiments are analyzed to verify the effectiveness of the improved algorithm for path planning tasks in an unknown dynamic environment.
Keywords/Search Tags:Deep Reinforcement Learning, Gazebo, Mobile Robot, Path Planning, ROS
PDF Full Text Request
Related items