Font Size: a A A

Research Of Path Planning Problem Based On Reinforcement Learning

Posted on:2018-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ZhaoFull Text:PDF
GTID:2348330536481927Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The field of man-machine symbiosis under uncertainty focuses on state awareness with the core of machine learning,path planning and decision making,as well as the evaluation of the decision making results.It contains scientific theoretical issues and engineering or technical problems,which have obvious theoretical significance and practical value.This topic mainly studies the reinforcement learning solution of agent path planning in unknown environment.Path planning of robots or agents in a particular environment means finding a collision-free trajectory,following that trajectory and robot can reach the goal location from the start location as fast as possible.Path planning has a long history of research,also produced a lot of mature algorithms,however most of these algorithms are based on the environment model,combined with search methods to complete path planning.There are several drawbacks to these approaches.Firstly,in many cases the environment model is very difficult to obtain;Secondly,due to the control errors or the environmental factors,the robot cannot follow the planned path,resulting a deviation to the end;third,the path may be very tortuous,full of inflection point,it's always not suitable for the robots to follow.In view of the above problems,this paper presents a new solution,that is solving the problem of path planning based on reinforcement learning.Aiming at the dilemma of exploration and exploitation problem in reinforcement learning,an optimized solution is proposed.The main contents of this paper are as follows:(1)Using the temporal difference method to solve the problem of path planning.Compared with other algorithms,the advantage is that we do not need to model the environment,and the method is adaptability and has the self-learning ability to deal with the existence of movement uncertainty.The algorithm is validated by simulation experiments.The results show that the temporal difference method can converge quickly,and the path to the target can be found at any position.(2)Studying the balance of exploration and exploitation.In reinforcement learning,there are two processes that always exist,exploitation and exploration.Too much exploration will make the training time longer,too much exploitation will make the agent convergence to the wrong solution.How to balance the exploration and exploitation has become an important research direction.Traditional methods usually reduce exploration with the increase in training time,without considering the complexity of the environment and the problem itself.This dissertation uses the success rate of the agent reach to the target as a standard,and adjust exploratory factor dynamically based on the path planning problem,so that when the agent is notmuch familiar with the environment,it will explore more.On the other hand,it will exploit the environment more.The results show that the improved method can explore and exploit better,and can make the agent reach to the target faster.
Keywords/Search Tags:Unknown environment, Reinforcement learning, Path planning, Exploration and exploitation
PDF Full Text Request
Related items