Font Size: a A A

Research On Path Planning Algorithm Based On Reinforcement Learning

Posted on:2022-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z WangFull Text:PDF
GTID:2518306521951969Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
With the development of science and technology,robot technology has been gradually applied to the various walks of life.The path planning of mobile robot in complex environment has always been a hot topic for domestic and foreign scholars.In order to improve the efficiency of path planning,a self-adapt reinforcement-exploration Q-learning(SARE-Q)algorithm and a expierence-classification multi-step DDQN(ECMS-DDQN)algorithm are proposed in this paper.The main contents of the research are as follows:Firstly,the research background and significance of this topic is described,the concept and research status of path planning problem,and the advantages of reinforcement learning in solving path planning problems are introduced.Also,the basic theory of reinforcement learning is introduced in detail.Secondly,aiming at the problems of repeated exploration and unbalanced exploration of Q-learning algorithm in path planning,the application of Q-learning algorithm and its improved algorithm in path planning is studied.The SARE-Q algorithm is proposed by replacing the decayed ?-greedy strategy of Q-learning algorithm with reinforcement exploration strategy.Firstly,the concept of behavior eligibility traces is introduced in the adaptive strategy,and the probability of each action being selected is adjusted according to the behavior eligibility traces.Secondly,the decayed process of ? is divided into two stages: the first stage is mainly exploration,the second stage is the transition from exploration to utilization,and the concept of success rate is introduced to dynamically adjust the exploration rate according to the success rate.At the same time,by maintaining a list of state access times,the exploration rate in the current state is dynamically adjusted according to the state access times.On this basis,the grid map environment is built through Open AI Gym platform,and the path planning simulation experiments of Q-learning algorithm,self-adaptive Q-learning algorithm(SA-Q)and SARE-Q algorithm are carried out.The experimental results show that for the average number of turns,the average success rate in the loop and the shortest times of planned paths,those planned by the SARE-Q algorithm are significantly better than the former two algorithms.In view of that Q-learning is limited by the number of action space and state space,and can not be applied to continuous state space.The application of DDQN algorithm and its improved algorithm in path planning is studied.In order to improve the accuracy of the target Q value estimation of DDQN in training,a multi-step guidance based DDQN algorithm(MS-DDQN)is proposed by using the real-time rewards of continuous multi-step interaction to replace the real-time reward of a single moment.Aming at that the experience replay method has the disadvantages of low efficiency,the DDQN algorithm based on experience-classification training method(EC-DDQN)is proposed.The method mainly maintains an additional experience pool to assist training.First,according to the characteristics of the state transitions,the experiences are classified and stored in different experience pools.During the training,different experience pools are sampled with a certain sampling proportion,and then used for training of Q-network after concatting.At the same time,the sampling proportion is dynamically adjusted according to the average training loss of the different experience pools.On this basis,an ECMS-DDQN algorithm is proposed to combine the advantages of multi-step guidance method and experence-classification method.Finally,a two-dimensional continuous map environment is built on Open AI Gym platform,and the path planning simulation experiments of DDQN algorithm,MS-DDQN algorithm,EC-DDQN algorithm and ECMS-DDQN algorithm are carried out.The experimental results show that the total return of ECMS-DDQN algorithm in path planning is higher than the former three,and the generalization capability is also better than the former three agorithms.Finally,on the basis of the above research,this paper uses the object-oriented development method,uses the Qt development platform with excellent cross-platform characteristics under Windows system,uses Py Qt and Python language programming,designs and implements the path planning simulation system.After testing,the simulation system runs well and meets the design requirements.
Keywords/Search Tags:path planning, Reinforcement learning, Q-learning, DDQN, SARE-Q, ECMS-DDQN
PDF Full Text Request
Related items