| Carrier-based aircraft is an important part of the fighting force of Carrier Vessels(CV).The safe take-off and landing of carrier-based aircraft on carriers is always the key and yet,a difficult point of accomplishing the combat mission smoothly in the CV and carrier-based aircraft system.China has already achieved aided landing;however,the aided landing technology is highly dependent on great meteorological conditions.Also,it is difficult to train the landing signal officer and other factors which hinder the development of the landing technology.For the auto carrier landing,China is only at the theoretical research stage.Automatic landing is a sequential decision control problem while reinforcement learning has a successful precedent and natural advantage of optimal control and sequential decision problem making.This thesis explores the application of reinforcement learning in the carrier aircraft,supplies a method in auto carrier landing control using deep reinforcement learning,and discusses the actor-critic algorithm for the landing problem.The main work of the thesis is as follows:(1)This thesis designs an actor-critic algorithm for the carrier landing problem.Under the specific business background of the carrier aircraft landing mission,we researched the state space,action space and reward function in the auto landing process of carrier-based aircraft.We did this thoroughly by adopting the actor-critic and the deterministic policy gradient theory without control model and dynamic model.Then the Markov decision process model which meets the requirements of problem setting is also given.(2)For the sparse data problem of reward function,this thesis proposes a reward shaping model to effectively solve the problem of reward sparseness in the landing process.The simulation flight software X-Plane is used in the reinforcement learning experimental environment for the first time and F/A-18 fighter is used as an example to achieve a smooth flight and successful landing.Therefore,a complete demonstration platform solution is designed.(3)This thesis also proposes an actor-adaptor-critic algorithm to improve the generalization of the algorithm in a nonstationary environment.We targeted some improvements of the actor-critic algorithm architecture.The adaptor is added to correct the action which is given by the actor to adapt to the environment.In order to test the adaptability of the algorithm to the nonstationary environment,the author modified the dynamic model of the agent in the enhanced learning integration environment to simulate the changes of the environment.The experimental results in the environment of Gym and MoJoCo indicate the validity of the algorithm proposed.Further,the algorithm has a certain degree of adaptability with the changes of the environment.In addition,we also apply the improved algorithm to the automatic landing mission of the carrier aircraft in the simulation environment,which shows better adaptability in the nonstationary environment.The thesis implements the reinforcement learning automatic landing algorithm with professional flight software X-plane in the simulation environment.It provides a reinforcement learning algorithm that can be used in various environmental effectively.The experiment of the nonstationary environment is tested in the integrated reinforcement learning environment and professional flight software.The test results of the environment have shown a good environmental adaptability in the nonstationary environment. |