With the development of related technologies such as control and communication,UAVs have played an irreplaceable role in many fields such as military and civilian use.In order to complete more complex tasks and improve the efficiency and effectiveness of task completion,the Flying Ad-hoc Network(FANET)emerged at the historic moment and has attracted more attention.The access protocol,as the lower-level protocol in the communication protocol,has a huge impact on the communication performance of FANET.Although the non-centered and self-organizing characteristics of FANET can well meet the needs of civilian and military tasks in complex and remote areas,its highspeed node movement and frequent topology changes make it difficult for traditional access protocols to achieve specific Qo S requirements.At the same time,the abovementioned characteristics also put forward requirements for the adaptability of FANET.Therefore,according to the characteristics of FANET,combined with the actual application scenarios of joint reconnaissance,this article uses reinforcement learning algorithms to research and optimize the competing access protocols in the access protocol,improve communication performance,and enhance adaptability.The main research contents of this paper are as follows:First of all,this article studies FANET’s access protocol and reinforcement learning theory,classifies various access protocols,and analyzes the advantages and disadvantages of different types of access protocols,and points out that competing protocols are the most suitable for FANET.At the same time,the key issues in the access process were discussed.After that,the reinforcement learning theory was researched,and Markov Decision Process(MDP)was analyzed and a simple derivation was given.Subsequently,several commonly used reinforcement learning algorithms are introduced and analyzed.Secondly,in order to improve the adaptability of the access protocol,and the adaptability of the access protocol,the article uses the Actor-Critic algorithm in the reinforcement learning algorithm to adaptively optimize p-CSMA and dynamically adjust the access probability to avoid communication performance degradation caused by setting too high or too low.At the same time,in actual tasks,it can be adjusted automatically as the task progresses and the topology of the UAV changes.The simulation of the access process of UAVs in the same interference domain proves the effectiveness of the optimization in terms of access success rate and channel utilization.Finally,because p-CSMA lacks reservation control under high load,the collision is serious,so this article studies and optimizes the CSMA/CA protocol in the competition protocol.This article first introduces the CSMA/CA protocol and the related DCF mechanism,and proves that during the access process,the size of the contention window has an important impact on the collision probability of FANET data frames,thereby further affecting the communication performance.Therefore,this article models the access process as an MDP model,and proposes to use the Q-Learning and Policy Gradient algorithms in reinforcement learning to dynamically select and adjust the contention window.And through simulation,the performance improvement in channel utilization,back-off delay and fairness is verified. |