Font Size: a A A

Research On UAV Obstacle Avoidance Technology In Dynamic Environment

Posted on:2020-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:S X MuFull Text:PDF
GTID:2392330590474299Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of technology and industry related to UAV,its utilization in reconnaissance,agriculture,logistics,and entertainment has gradually increased.Due to the increasing complexity of flight environments,autonomous obstacle avoidance capabilities have become a necessary capability for modern UAV.Autonomous obstacle avoidance decision-making is a typical agent decision problem.And existing traditional decision-making methods are difficult to achieve high performance.The deep reinforcement learning theory is introduced into the autonomous obstacle avoidance decision-making process.Firstly,the UAV obstacle avoidance problem is modeled as the action decision problem in the changing environment.The obstacle and the agent coordinates,as well as the dynamically changing environment are vectorized as the input vector of the deep reinforcement network.The output of the deep-reinforcement network is transformed into the action of the agent.In the training process,the agent will obtain different rewards when it chooses different actions,according to which,the backpropagation algorithm is used to update the network parameters.By interacting with the environment,the agent continuously learns to achieve the autonomous decision.The UAV obstacle avoidance algorithm using the single-network structure makes an overly optimistic estimate due to the repeated use of the maximum theoretical value in estimating the value of the action,resulting in a cumulative positive error.The single-network structure is transformed into a dual-network structure.In the process of training,the optimal action selection and action value estimation are decoupled.The overestimation problem of the single-network structure UAV obstacle avoidance algorithm is reduced,and the performance of the obstacle avoidance algorithm is improved.A playback unit for storing interactive experience is specifically set up in the stage of updating the network parameters.Historical experience is extracted from it to break the time-correlation of the samples used to update the network parameters.Finally,the experience replay algorithm is improved and a new exponential function based priority experience replay deep Q learning algorithm is proposed.By redesigning the mapping function of historical experience importance to extraction probability when extracting historical experience,the algorithm can automatically select the more important playback unit to be learned.Compared with the traditional algorithms,the proposed algorithm not only ensures the quality of decision making,enabling the agent to learn the optimal strategy,but also effectively improves the performance of the task and the efficiency of decision making.In the experimental simulation,firstly the intuitive model strategy of the improved algorithm is analyzed,and then the cost function analysis,the efficiency analysis and the task performance comparison of each algorithm are carried on.Finally,by comparing and analyzing the simulation results of three replay algorithms in the test environment(pixel game)and UAV obstacle avoidance simulation environment,it is proved that the duel-network structure exponential function based priority experience replay deep Q learning algorithm proposed can make the agent achieve better performance,that is,making more efficient and quality decisions in less time.
Keywords/Search Tags:moving obstacles avoidance, UAV, agent decision, deep Q learning, experience replay
PDF Full Text Request
Related items