An Optimized DQN Algorithm Based On The Memory Optimization Mechanism

Posted on:2021-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:T X Chen

Full Text:PDF

GTID:2428330605954318

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Advances in science and technology are driving the rapid development of artificial intelligence,Reinforcement learning,as an important branch of artificial intelligence,is more and more widely used,especially in solving the navigation and exploration problems of intelligent mobile robot.Mobile robot navigation technology is the foundation and an important guarantee for the robot to complete the task planning.For example,unmanned driving technology,intelligent drones,and intelligent air-space integration technologies are inseparable from the development of navigation technology.Path planning as the basis of mobile navigation technology has attracted the attention of many scholars.However,the environment that intelligent mobile robots face is complex and changeable;the traditional path planning method cannot meet the requirement of intelligence.Therefore,the intelligent path-planning algorithm needs to be studied urgently.At present,the combination of reinforcement learning and mobile robot navigation technology is one of the important directions of intelligent path planning algorithms.In view of this,aiming at the problem of intelligent moving path planning under an unknown environment,an algorithm that is studied deeply based on reinforcement learning,for robot path planning is presented.Hence,the path for a robot that is satisfying and adapting to various complex environments is explored intelligently through the method of "trial and error",and then achieving the synchronous completion of learning and planning.The research contents are as follows:(1)To solve problems such as optimal path planning of traditional algorithms in unknown environments.Introducing the idea of A* shortest path,this paper proposes a DQN(Deep Q?learning Network)algorithm based on heuristic reward function.This algorithm designs a heuristic reward function that uses distance as the evaluation criterion.Through the exploration and understanding of environmental information by the currently executed actions,it uses the calculation and feedback of the deep reinforcementlearning network to help the robot to make a quickly dynamic choice for distance optimization according to the current action.Then the optimal solution of it can improve the learning efficiency of the algorithm in distance calculation.This algorithm builds two different simulation environments by using Python language and Tkinter module.The results show that,under the premise of a complex environment and sufficient training,the algorithm has obvious advantages over RRT,DDPG,and original DQN algorithms,and the path planning distance is shortened by 33.3%,25.9%,and 31% respectively(the traditional A* algorithm cannot complete the planning task),However,there are some shortfalls in search time.(2).To solve the problem that the above algorithm increases the time cost,this paper designs an optimized DQN algorithm based on the memory optimization mechanism.This mechanism is mainly optimized in two aspects of the establishment and update of the memory bank: 1)Reduce similar memories and increase unassociated memories during the memory bank establishment phase;2)Apply the "TD-error" minimum principle method in the updating of the memory bank.This ensures the maximum learning rate of actions in the memory bank,avoids the selection and use of repeated actions,thereby improving the learning efficiency of the algorithm and reducing the time cost of learning.This algorithm builds two different simulation environments by using Python language and Tkinter module.The results show that after the introduction of memory optimization mechanism and the reinforcement learning networks has been fully trained.Compared with RRT,DDPG,and the original DQN algorithm,the search time is reduced by 14.3%,9.1%,and 53.8%,respectively.Meanwhile,the Loss function diagram also confirms that the learning effect is also the best.In summary,the improved path planning method based on reinforcement learning proposed in this paper has good results in terms of time and optimal path planning,which meets the research expectations and has strong theoretical and practical significance.

Keywords/Search Tags:

reinforcement learning, heuristic reward function, memory optimization mechanism, DQN algorithm

PDF Full Text Request

Related items

1	Research On Reward Optimization In Reinforcement Learning
2	Reward Mechanism Research Of Reinforcement Learning-based Continuous Integration Test Case Prioritization
3	Reward Of Reinforcement Learning Of Test Optimization For Continuous Integration
4	Research On Reward Function Of Reinforcement Learning In Continuous Integration Testing
5	Optimization Of TCP Reinforcement Learning Method For Continous Integeration
6	Research On Sample Generation And Selection Methods For Deep Reinforcement Learning
7	The Improvement And Application Of Reinforcement Learning Algorithm Research
8	Study Of Robot Arm Control Based On Deep Reinforcement Learning
9	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
10	Robot Navigation Algorithm Based On Reinforcement Learning In Unknown Environment