| With the progress of science and technology and the development of society,quad-copter drones are becoming more and more important to people’s daily lives,and are widely used in areas such as military reconnaissance,landscape aerial photography,and power inspection with their small size and flexibility,providing a great convenience to people’s daily lives.As UAVs play an increasingly important role in people’s lives,the demand for UAVs to be able to perform obstacle avoidance and navigation tasks independently is also increasing day by day.Most of the current common UAV naviga-tion methods divide the navigation problem into three stages:perception,map building,and path planning,but this approach greatly increases the processing delay and makes the UAV lose its agility advantage.In this paper,we propose an end-to-end naviga-tion strategy based on causal reinforcement learning and causal imitation learning to learn directly from data,thus skipping the explicit map building and planning steps and making it highly responsive.The main works are as follows:(1)Research on autonomous UAV navigation algorithms based on causal rein-forcement learning for obstacle avoidance.To investigate the causal reinforcement learning-based autonomous UAV navigation algorithm,we address the drawback that the use of a continuous action space prevents an intelligent body from learning effec-tive experiences from past actions.In this paper,we propose the Actor-Critrc method,which fixes the vehicle in the same plane and uses a discrete action space,so that the intelligent body can learn from the actions taken in the past.Critrc method,which makes the sampling of the intelligent body in the experience playback pool more effi-cient and the optimization process more stable,and finally improves the success rate of the reinforcement learning algorithm in the UAV obstacle avoidance navigation prob-lem.Meanwhile,in order to solve the problem of end-to-end method generalization to solve the problem of insufficient ability of end-to-end method generalization,this paper introduces a causal inference step in the training process of reinforcement learning to solve the phenomenon of overfitting due to insufficient interaction with the environment in the training process,and improve the success rate of UAVs in unfamiliar environ-ments This paper introduces a causal inference step in the training process to solve the overfitting phenomenon caused by insufficient interaction with the environment in the training process,and to improve the success rate of the UAV in accomplishing obstacle avoidance and navigation tasks in unfamiliar environments.(2)Study of autonomous UAV obstacle avoidance navigation algorithm based on causal imitation learning.To address the problem that reinforcement learning makes it difficult to design a reward function to obtain an ideal policy network,this paper pro-poses to solve the UAV obstacle avoidance navigation problem based on imitation learn-ing.Imitation learning can provide reward information based on the example data pro-vided by experts to optimize the policy network,thus avoiding the problem of designing reward functions.However,manual example data is expensive and poorly diversified.In this paper,we propose to use global path planning algorithm to generate reference paths and sample the reference paths to generate expert data,which solves the prob-lem of shortage of expert data samples and poor diversity.Meanwhile,the improved heuristic function of A*algorithm in this paper makes the expert example paths farther away from obstacles and more conservative in safety,which improves the navigation success rate of navigation strategy at high speed flight.In this paper,the control points of the polynomial function are used as the output of the strategy network,which makes the planned trajectory output of the network have higher mobility and flexibility,and provides a robust underlying design for the high-speed obstacle avoidance navigation flight of UAVs.Similarly,in order to solve the problem of insufficient generalization ability of the strategy network,this paper introduces causal inference algorithm in the imitation learning process to enhance the generalization ability of the strategy network,so that it can still maintain the navigation success rate under the training environment in the test environment.(3)In order to verify the effectiveness of the proposed method,a UAV simulation platform based on Unity and ROS is built to support the research needs of the algo-rithm.Based on this simulation platform,we design comparison experiments to verify the effectiveness of the discrete action space Actor-Critic algorithm in the causal rein-forcement learning algorithm,using the navigation success rate of random targets as the evaluation criterion.The convergence step in the training environment and the naviga-tion success rate of random targets in the test environment are also used as quantitative metrics to verify that causal inference can effectively improve the generalization ability of the reinforcement learning algorithm and reduce the overfitting of the policy network to the training environment.The designed experiments verify that the improved A~*al-gorithm in the causal imitation learning algorithm has a positive enhancement effect on the navigation success rate of the apprentice network;the effectiveness of causal in-ference in improving the generalization ability of the imitation learning algorithm and reducing the overfitting of the apprentice strategy network to the training environment is verified. |