The rapid growth of UAV(unmanned aerial vehicle)technology,machine learning,and computer science have made the application of UAV become more and more widely used.UAV has been applied a lot in the military field,and its applications will be very wide in the civilian as well.Presently,the research on UAV is quite popular at home and abroad,but the current research on UAV is still mainly focused on obstacle avoidance control,visual tracking,target detection,three-dimensional environment reconstruction.But the primary issue is how UAV flies autonomously.The essence of SAA is that the SAA system is triggered when the distance between the intruder and the UAV breaks the minimum distance defined by e ach other as well as the corresponding evading strategy is adopted.Specifically,the SAA system detects the presence of an intruder through the sensing system,and the result given by decision-making mechanism is circumvented to avoid intruders.Among them,the decision-making mechanism is the core of SAA.Its implementation determines the autonomy of the UAV and the flight safety.This paper mainly studies the decision-making mechanism in the SAA system of UAV.Establishing a Markov decision process(MDP)to model UAV,set its state set,action set,reward function and state transition function,and applies dynamic programming algorithm to figure out the optimal policy.The conclusion is drawn that the value iteration algorithm with better performance in th e case of large-scale state sets through the comparison of the policy iteration algorithm and the value iteration algorithm.However,due to the dependence of prior knowledge,decision-making mechanism based on MDP is not conducive to practical applicatio n.This paper uses Monte Carlo method to improve the performance of decision-making mechanism,it can gain experience directly from the interaction with environment and finally figures out the optimal policy as well as decreases the computational complexit y.Then,the comparison between First-Visit Monte Carlo(FVMC)algorithm and on-policy Monte Carlo algorithm is made through simulation.It turns out that on-policy Monte Carlo algorithm gets better overall performance than FVMC.In the end of this paper,t he decision-making mechanism is simulated at different cases that different numbers of intruders and different scales of maps as well as verify the effectiveness of the on-policy Monte Carlo algorithm for solving the optimal policy. |