Font Size: a A A

An Optimized DQN Algorithm Based On The Memory Optimization Mechanism

Posted on:2021-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:T X ChenFull Text:PDF
GTID:2428330605954318Subject:Engineering
Abstract/Summary:PDF Full Text Request
Advances in science and technology are driving the rapid development of artificial intelligence,Reinforcement learning,as an important branch of artificial intelligence,is more and more widely used,especially in solving the navigation and exploration problems of intelligent mobile robot.Mobile robot navigation technology is the foundation and an important guarantee for the robot to complete the task planning.For example,unmanned driving technology,intelligent drones,and intelligent air-space integration technologies are inseparable from the development of navigation technology.Path planning as the basis of mobile navigation technology has attracted the attention of many scholars.However,the environment that intelligent mobile robots face is complex and changeable;the traditional path planning method cannot meet the requirement of intelligence.Therefore,the intelligent path-planning algorithm needs to be studied urgently.At present,the combination of reinforcement learning and mobile robot navigation technology is one of the important directions of intelligent path planning algorithms.In view of this,aiming at the problem of intelligent moving path planning under an unknown environment,an algorithm that is studied deeply based on reinforcement learning,for robot path planning is presented.Hence,the path for a robot that is satisfying and adapting to various complex environments is explored intelligently through the method of "trial and error",and then achieving the synchronous completion of learning and planning.The research contents are as follows:(1)To solve problems such as optimal path planning of traditional algorithms in unknown environments.Introducing the idea of A* shortest path,this paper proposes a DQN(Deep Q?learning Network)algorithm based on heuristic reward function.This algorithm designs a heuristic reward function that uses distance as the evaluation criterion.Through the exploration and understanding of environmental information by the currently executed actions,it uses the calculation and feedback of the deep reinforcementlearning network to help the robot to make a quickly dynamic choice for distance optimization according to the current action.Then the optimal solution of it can improve the learning efficiency of the algorithm in distance calculation.This algorithm builds two different simulation environments by using Python language and Tkinter module.The results show that,under the premise of a complex environment and sufficient training,the algorithm has obvious advantages over RRT,DDPG,and original DQN algorithms,and the path planning distance is shortened by 33.3%,25.9%,and 31% respectively(the traditional A* algorithm cannot complete the planning task),However,there are some shortfalls in search time.(2).To solve the problem that the above algorithm increases the time cost,this paper designs an optimized DQN algorithm based on the memory optimization mechanism.This mechanism is mainly optimized in two aspects of the establishment and update of the memory bank: 1)Reduce similar memories and increase unassociated memories during the memory bank establishment phase;2)Apply the "TD-error" minimum principle method in the updating of the memory bank.This ensures the maximum learning rate of actions in the memory bank,avoids the selection and use of repeated actions,thereby improving the learning efficiency of the algorithm and reducing the time cost of learning.This algorithm builds two different simulation environments by using Python language and Tkinter module.The results show that after the introduction of memory optimization mechanism and the reinforcement learning networks has been fully trained.Compared with RRT,DDPG,and the original DQN algorithm,the search time is reduced by 14.3%,9.1%,and 53.8%,respectively.Meanwhile,the Loss function diagram also confirms that the learning effect is also the best.In summary,the improved path planning method based on reinforcement learning proposed in this paper has good results in terms of time and optimal path planning,which meets the research expectations and has strong theoretical and practical significance.
Keywords/Search Tags:reinforcement learning, heuristic reward function, memory optimization mechanism, DQN algorithm
PDF Full Text Request
Related items