Font Size: a A A

Research On Deep Reinforcement Learning Design Method For Multi-Constraint Guidance Contro

Posted on:2024-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:D H DouFull Text:PDF
GTID:2532307142451454Subject:Electronic information
Abstract/Summary:PDF Full Text Request
This paper is based on the research of small-scale guided munition and employs the method of deep reinforcement learning(DRL)to investigate topics such as aerodynamic identification,controller design,and multi-constraint guidance.An adaptive genetic algorithm-back propagation neural network(AGA-BPNN)aerodynamic identification model has been designed by combining genetic algorithm and neural network,which realizes online acquisition of aerodynamic parameters while expanding the aerodynamic data.The model takes the angle of attack,Mach number,and rudder deflection angle as inputs and corresponding aerodynamic coefficients as outputs.The genetic algorithm adaptive adjusts the initial weight and threshold values of the neural network through crossbreeding and mutation probability to effectively avoid the identification results from falling into local optimization.After training,the derivative characteristics of the neural network are exploited to further identify the aerodynamic derivatives of the corresponding aerodynamic coefficients,which provide the basis for subsequent control system design and ballistic calculation.In response to the problem of the traditional linear autopilot design process being cumbersome and difficult to meet the requirements of full flight envelope performance,a two-loop autopilot based on Twin Delayed Deep Deterministic Policy Gradient(TD3)is proposed.The deep reinforcement learning model of the two-loop autopilot is constructed,with flight information entropy as the state and autopilot control parameters to be designed as the action.A reward function is designed to constrain the stability margin of the system.The TD3 algorithm is used for offline learning of the autopilot control parameters for the full flight envelope,resulting in a fitting model that can be directly applied to the guidance loop.The fitting model is verified online with the pitch constraint guidance problem,and simulation results show that the proposed two-loop autopilot can adjust the control parameters in real-time based on the flight state,ensuring attitude stability while achieving accurate acceleration command tracking.In response to the problem of multi-constraint guidance,this study utilizes the Proximal Policy Optimization(PPO)algorithm to design terminal angle-constrained DRL guidance control strategies based on both guided-control feedback loops and integrated guided-control systems.The Markov Decision-making Process(MDP)is constructed to consider the comprehensive dynamics of the missile body and the influence of control mechanisms.The real-time angle error is introduced into the state vector,and the normal acceleration and rudder deflection angle are limited.A rational reward function is designed to reduce the distance between the missile and the target while correcting the pitch angle error and addressing the issue of sparse rewards.A Beta distribution is used for policy sampling to eliminate the negative impact of unbounded distributions on bounded action spaces.Additionally,an entropy regularization term is introduced to encourage various action explorations.The designed pitch-angle-constrained DRL strategy is further augmented with a threshold switch control to implement visuo-constraint by correcting the guidance head’s angle.Simulation and Monte Carlo hitting tests in multiple scenarios demonstrate the effectiveness,versatility,and robustness of the proposed method.
Keywords/Search Tags:Deep reinforcement learning, Autopilot, Multi-constraint guidance, Aerodynamic identification, Policy gradient
PDF Full Text Request
Related items