Font Size: a A A

Research On Objects Grasping Method Based On Reinforcement Learning

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:K Y LiuFull Text:PDF
GTID:2428330611998163Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The current is an era of intelligence,whether it is for traditional industrial robots or service robots whose technology is not yet mature,they have presented new challenges.As a very important ability of robots,robotic arms have always been a hotspot in domestic and foreign research.With the continuous development of deep learning,although the method based on computer vision has a high success rate of crawling,its efficiency and robustness are poor.The grasping method of reinforcement learning can effectively complete the grasping task through self-supervised learning in an environment with a wide variety of items and arbitrary poses.First,this article elaborated the traditional grasping method based on computer vision and the basic principles of reinforcement learning,and according to the similarities and differences of target strategy and behavior strategy,it is divided into onpolicy and off-policy.In the strategy,using only the currently known optimal choice,it is easy to converge to the local optimal.On the contrary,the off-policy achieves local optimum by keeping exploring and getting diverse data.The TD3 algorithm based on the off-policy has a good effect in the Mujoco environment,but it is not suitable for robotic arm grasping tasks.Using the derivative-free optimization method CEM instead of the deterministic strategy in TD3(CEM-TD3),the problem of sparse environmental rewards in the TD3 algorithm is solved,making it more suitable for object grabbing.Then,the network structure is designed for the Q function in the CEM-TD3 algorithm.The grab model uses a convolutional neural network as the main structure.The input layer is divided into states and actions.The states need to extract features through the convolution layer,and the actions are added to it after the fully connected layer,and the output is the Q value.Using pooling layer,batch normalization,dropout and other means to optimize the network structure;for the problem of object sparse reward sparseness,set the loss function to the cross entropy function of the classification problem;use the reward penalty method to accelerate Network training;by increasing the status information input by the network,the success rate of grasping is improved.The main process of the algorithm is to first accumulate a replay buffer,then train while storing,test and verify it in stages.Finally,using the bullet physical engine to build the simulation experiment platform.In the experimental setting,5 training items are randomly placed each time during the training process.Any item that reaches the specified height is considered a successful grab.To verify the effect of the algorithm,select the test set items that have not been seen in training,and repeat 100 times to calculate the number of successes.Experiments are used to verify the feasibility of the algorithm and the stability of the closed-loop control system.
Keywords/Search Tags:Robotic Grasping, Reinforcement Learning, Grasping Convolutional Neual Network, Simulation
PDF Full Text Request
Related items