Font Size: a A A

Research On Deep Reinforcement Learning Based UCAV Decision-making Approaches

Posted on:2021-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:S X YouFull Text:PDF
GTID:1362330605480312Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The recent appreciation of integrated combat systems arises from the technologies related to cognitive electronic warfare(CEW).Currently,as the best decision-making carrier in military field,the unmanned combat air vehicle(UCAV)is no longer sufficient to cope with the complicate situational dynamics of combat by manual means alone.On the model side,the confidentiality of field trail data leads to the systems' simulation results can not get effective feedback from the reality,on the algorithm side,the low level of autonomy of intelligent decision-making algorithms prevents UCAV from adapting to confrontational environments with ever-changing demands.The thesis will conduct a systematic research on sequential combat decision-making problems:To solve the problem of decision-making for end-to-end navigation tasks under prior near-ignorance,a real-time motion framework based on 3D situation space is proposed.First,a suitable filtering algorithm is used to track and predict the motion states of target,and then the situation space is defined by analyzing the relative characteristics between the UCAV and the target to evaluate the maneuverable clear space;Secondly,an approach of generating waypoints with uniform acceleration model in a constrained coordinate system is designed via using the vector-based navigation idea of artificial potential field method;Finally,based on the judgement of threats,the one-step situational prediction is used to create the decisions of tracking and collision avoidance for a UCAV.Simulation results show that under the premise of soft-landing tracking,the proposed framework can plan a smooth and flyable trajectory in real time using only attitude angles as input.A navigation decision method incorporating Deep Reinforcement Learning(DRL)is proposed to address the problem of UCAV's low adaptability to the environment in traditional target searching tasks.First,based on the UCAV's underactuated motion model,a complete CEW framework and a new target searching environment,Explorer,are constructed via using Python;Secondly,state representation of the target is estimated with two neural networks,via combining partially observable Markov decisions and self-coding variable Bayesian;Finally,the use of exquisite reward shaping skills allows the deep deterministic policy gradient(DDPG)algorithm to be adapted in the task environment.Experimental results show that the proposed DRL framework allows the agent to successfully learn optimal control strategies from the potential state space and output end-to-end continuous action decisions.A DDPG-based continuous-action navigation framework of UCAVs is proposed to overcome the robustness problem of traditional target tracking decision-making methods in 3D dynamic space.First,a target tracking environment,Tracker,is created with our CEW framework,and then,a theoretical analysis of manoeuvre bias due to observation errors of the environment is made;Second,the behavior rewards of the agent,inspired by vector-based navigation,are carefully designed to ensure that the decision output of the DDPG is reliable;Finally,the DRL framework is used to validate the simulation of various target tracking tasks with good results.Meanwhile,in terms of behavior evaluation,the agile maneuvering strategies mastered by the agent are dissected by pattern segmentation of a high-quality trajectory.Threat evaluation and jamming allocation system based on spatial variation are proposed in response to the strong priori dependence and time discontinuity of decisions during traditional jamming-radar process.First,in a jamming-radar process,the interaction between radar stages and jamming techniques,different jamming techniques,and different radar stages are analyzed theoretically.The threat evaluation is then performed based on observable target characteristics,with an objective function aimed at minimizing the danger value of radar network.Finally,proposing a new evolutionary computation approach to optimize the cooperative jamming decisions.After an independent test of the jamming module,a joint-debugging with the DRL-based navigation framework is carried out toverify that agnet can quickly make integration decisions of navigation and confrontation in a dynamic space.
Keywords/Search Tags:Deep reinforcement learning, Target tracking, Target searching, Cooperative jamming, Evolutionary computation
PDF Full Text Request
Related items