Font Size: a A A

Low-Cost Navigation Based On Rein-Forcement Learning For Autonomous Vehicles

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2392330602486027Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence technology,the technological revolu-tion of the automotive industry is imminent.In recent years,autonomous driving vehicles,as a combination of the two,have gradually attracted more and more attention from scholars and the industry.As the core component of autonomous driving,the navigation algorithm aims to provide an accurate trajectory that can be followed by the vehicle.The vehicle should avoid the obstacles while driving to the destination.However,many existing navigation algorithms rely heavily on the detailed high-precision priori map,and the high-precision positioning equipment,which are easily invalidated in the case of dynamic changes of the map or GPS signals blocked by high buildings and trees.Moreover,the cost of the high-precision map collection,maintenance,and high-precision positioning equipment,has also greatly hindered the large-scale popularization of autonomous driving vehicles.Based on the above considerations,this paper proposes a "low-cost" navigation algorithm based on reinforcement learning to get rid of the dependence on high-precision maps and high-precision positioning equipment.This paper redesigns the network structure and the input and output of the algorithm based on the Deep Deterministic Policy Gradient(DDPG),which can re-alize the map from global reference waypoints and low-dimensional obstacle information around the vehicle to the front wheel angle of the vehicle.The data is pre-processed,and the normalized obstacle information and reference waypoints in the vehicle body coordinate system are used as the actual input of the network,which significantly accelerates the convergence speed of the network.In order to eliminate the reliance on high-precision maps and high-precision positioning equip-ment,the corresponding reward function is specially designed to apply different reward function to tasks such as tracking reference trajectories and avoiding obstacles,so that vehicles can interact with the environment,and learn to avoid obstacles while driving towards a destination without relying heavily on global reference information.During the training process,this paper also proposes an "easy first and then difficult" learn-ing strategy,which gradually increases the difficulty of the vehicle’s environment,and making it learn to track reference waypoints,realize obstacle avoidance while tracking accurate reference waypoints,and finally learn that even under the circumstances where having low accuracy ref-erence waypoints,low accuracy positioning results and even the positioning results is lost,the navigation tasks can still been achieved.This "easy first and then difficult" learning strategy pre-vents the vehicle from being "overwhelmed" and unable to converge for a long time due to the combined effects of various reward functions at the beginning.In the simulation environment,the vehicle can learn to output the optimal front wheel angle by considering the vehicle kinematics and vehicle geometry.Simulation results show that after sufficient training,the vehicle can suc-cessfully navigate to the destination even if the map is not accurate,the positioning is not accurate,or even the signal is lost.Meanwhile,this paper further applies the algorithm to the actual vehicle platform.Consid-ering that there is a large gap between the simulated vehicle model and the actual vehicle system,it is not reasonable to use the front wheel angle calculated by the network as the control input of the actual vehicle system.Therefore,this paper utilizes the concept from "parallel driving" and uses path as the bridge to connect the simulation environment and the actual vehicle system.The virtual vehicle can drive a certain distance in the imaginary environment,and then use the virtual vehicle’s driving trajectory as the path planning results of the actual vehicle.Then,we convert a motion planner to a path planner.Finally,this paper carries out an actual vehicle experiment in a densely wooded area.To further simulate more extreme situations,random noise is artificially su-perimposed on the positioning results.Moreover,this paper also compares the algorithm based on DDPG with the discrete optimization methods which commonly used in autonomous driving ve-hicles.Actual vehicle experiments show that this method can effectively get rid of the dependence on high-precision maps and high-precision positioning equipment.Furthermore,the constraints of vehicle kinematics and vehicle geometry can be considered to output the optimal path,which makes the planning result closer to the actual trajectory of the vehicle,and avoids planning failures due to the inconsistencies between actual movements and planning results.In summary,the main contributions of this paper are listed as follows:1.A "low-cost" navigation algorithm based on the reinforcement learning DDPG algorithm is proposed and the network structure and input and output variables are redesigned.A da-ta pre-processing scheme is designed to speed up network learning efficiency.The reward function is also redesigned for the specific problem.In the process of interacting with the environment,the vehicle learns to use the the approximate driving direction given by the inaccurate reference waypoints and the inaccurate positioning result to complete the navi-gation task.2.In the training process,an "easy first and then difficult" learning strategy is proposed.It prevents the vehicle from being "overwhelmed" and unable to converge for a long time due to the various different reward functions at the initial stage.Meanwhile,the vehicle kinematics and vehicle geometry are considered in the training environment,so that the vehicle learns to output the optimal action under this constraint during the training process.3.Inspired by the idea of "parallel driving",a method is proposed to connect the simulated environment and the actual vehicle system through the bridge of paths,which realize the transition from the simulated vehicle model to the actual vehicle system.Furthermore,the simulated trained model considers the vehicle kinematics and vehicle geometry.Therefore,the planning result is closer to the actual vehicle trajectory.
Keywords/Search Tags:autonomous driving, navigation, reinforcement learning, low-cost, parallel driving
PDF Full Text Request
Related items