| The wide application of aircrafts,especially UAV(Unmanned Aerial Vehicle)in production and life,has made autonomous flight control a core technology and major and difficult problem in the field of aircrafts.How to design and implement an efficient and high-quality automatic driving control system has become one of the key issues in the design and manufacture of a new generation of intelligent aircraft.At present,most flight control research focuses on artificial control at low latency;or the mission path is programmed to a hard-coded manner to let the aircraft follow the planned path to complete the task.The two types of aircraft control technology cannot be separated from human control and planning,and they do not have true autonomy in the true sense,so they greatly limit the application scenarios of the mission and reduce the efficiency of the aircraft.This thesis studies the problem of autonomous flight of aircraft with limited computing and limited bandwidth under the task of complex environment.In the flight mission of the UAV,on the one hand,the drone needs to perceive the changes of the surrounding scenes to ensure its own safety issues,on the other hand,it needs to complete the mission goal in the shortest time as much as possible according to the flight mission set by the controller.With the development of computing technology,deep learning has achieved remarkable results in the fields of computer vision and reinforcement learning and so on.Deep learning itself has the simple and learnable characteristics,so the autonomous flight control technology of aircraft based on deep reinforcement learning is theoretically feasible and advanced.This thesis used the deep reinforcement learning method and achieved the autonomous control of the drones.The specific work of this thesis is as follows:1)From the perspective of neural network structure,a DDPG(Deep Deterministic Policy Gradient)network that integrates the main mechanism is designed.A multi-actor single critic structure is used,the action is determined by multiple actors together,and the degree of each actor is determined by the attention model.The experimental results show that the attention model can accelerate the reinforcement learning training while maintaining or even significantly improving the final effect of reinforcement learning.2)Reinforcement learning can learn in the environment and gain new experience,but these experiences are usually unable to obtain a high average return or a large variance in returns when the strategy is not good enough or the strategy has not converged.Reinforcement learning generally does not have good strategies in the initial state,and it is difficult to learn useful knowledge.But human experience-manually collected data is guaranteed.The two methods of parameter pre-training and priority human experience pool are injected into human experience,which reduced the training time spending and improved the reliability of reinforcement learning.3)In order to obtain a large amount of multi-modal data,at the same time,in order to speed up the training of reinforcement learning network.In this thesis,a distributed learning framework is established.At the same time,in order to reduce over fitting,the transmission method in the distributed learning framework is used in this thesis.In order to explore more effectively,Ornstein–Uhlenbeck noise is used instead of the traditional noise method.On the basis of the above work,the UAV autonomously flew to the vicinity of the target point in the AirSim simulation environment,and achieved a high success rate. |