| The characteristics of high coupling,under-actuated and nonlinearity of the quadrotor uncrewed aerial vehicle(UAV)make it difficult for controller design through modeling to meet requirements of efficient and stable control performance in unknown dynamic environment.The main reason is that the integrity and comprehensive modeling work is a challenge.Recently,applications of data-driven learning to complex control tasks has been paid much attention.Reinforcement learning(RL)is able to carry out learning base on a imprecise object model by updating and optimizing control strategy through data generated by interaction with environment,which provides a new mind to above problem.Meanwhile,taking the limited battery capacity causing insufficient endurance of UAV into consideration,energy consumption was optimized by reinforcement learning technology based on stable attitude control,so as to improve UAV’s sustainable flight ability.The main research work of this paper is as follows:(1)In UAV attitude control,By corresponding attitude angular velocity tracking error to the state space of Markov decision process,a variety of reward functions were designed,and motor control signal was simplified into energy proportion of interval [0,1]as the representation of action space,which establish the reinforcement learning model of attitude control.Meanwhile,a simulation platform was built based on the simulation flight frame Gym FC.In simulation,Proximal Policy Optimization Algorithm(PPO)is used to discuss and compare the influence of various reward settings and related parameters on training and attitude control,the reward calculation method with outstanding performance is consequently determined.In order to solve the problem that Twin Delayed Deep Deterministic Policy Gradient(TD3)and PPO have different training results in different degrees in this task,which leads to poor experimental repeatability,a UAV attitude control based on Soft Actor Critic(SAC)algorithm combined with the maximum entropy framework is proposed.The results and comparison of repeated experiments show that this method is helpful to improve the experimental repeatability while solving the attitude control problem.(2)A control energy consumption optimization based on SAC algorithm was proposed,aiming to reduce power consumption on the basis of attitude control,so as to improve the endurance ability of quadrotor.Through the analysis of the simulation demand of energy consumption optimization,the design of the experimental scheme is completed.The experimental environment part mainly includes lithium battery model,battery model expansion of UAV,energy consumption optimization training design and information interaction between simulation platform modules.Firstly,the battery model expansion of UAV was completed through the realization of battery model,coding its Gazebo plugin and its status information.Secondly,according to the training requirements,the observation space of reinforcement learning and the calculation method of reward are adjusted to determine the required battery status information and realize information interaction between modules in simulation platform.Finally,before experiment the platform was tested and verified,and the comparison between actual and expected data showed that it was in line with the expected settings.The training,evaluation and comparison results of energy consumption optimization experiment show that the optimized controller is able to get stable control performance and reduces energy consumption simultaneously,which verifies the effectiveness of this method. |