| In the communication process of underwater wireless sensor networks,the communication performance will be interfered by many factors,such as strong noise interference,complex and changeable channel environment,and limited energy of sensor nodes.All these problems will lead to the decline of the communication performance of underwater wireless sensor networks.In underwater wireless sensor networks,the adaptive problem of the physical layer and the routing decision problem of the network layer can be described as Markov decision processes,Therefore,reinforcement learning algorithm can be used to solve the above problems,and relevant network optimization algorithms can be designed to improve the communication performance of the network.Firstly,the protocol stack and common network topology of underwater wireless sensor networks are introduced.Subsequently,the adaptive modulation technology for underwater acoustic communication and traditional adaptive modulation algorithms based on threshold partitioning were introduced.At last,the related theory of reinforcement learning is briefly introduced,which proves that reinforcement learning algorithm can be used in underwater wireless sensor network optimization algorithm.Then,the physical layer optimization algorithm is proposed.Firstly,an adaptive modulation system based on reinforcement learning was introduced,followed by two adaptive modulation algorithm schemes based on Q-learning and SARSA.Then,in response to the problem that low dimensional state values cannot accurately represent channel states in the three proposed adaptive modulation algorithms,this paper proposes a DQN based adaptive modulation algorithm scheme.Then,the channel set is simulated by using the measured Arctic sound velocity profile data,and two-dimensional and three-dimensional underwater wireless sensor network communication is simulated by changing the receiving and transmitting positions.Based on this,four adaptive modulation algorithms are simulated and verified.The simulation results show that under the simulated two-dimensional and three-dimensional underwater wireless sensor network communication mode,The adaptive modulation algorithm based on DQN in this article has at least 2.74% improvement compared to the other three adaptive algorithms.At the same time,according to the Songhua River communication experiment data validation,it is verified that when the number of channel sets is small,the performance of the algorithm in this paper still has at least 1% improvement.Finally,the network layer optimization algorithm is proposed.First,in order to extend the network life cycle,this paper proposes a Q-learning based energy-saving depth routing algorithm QEEDBR for underwater wireless sensor networks.QEEDBR algorithm comprehensively considers three aspects of node depth,node residual energy,and the number of two hop nodes.Based on this,the priority order of nodes in data forwarding is established to avoid network hotspots,network multi-path transmission,node bypass,node holes and other phenomena.Then,the optimal parameters of the QEEDBR algorithm are determined through simulation,and the performance of the algorithm is analyzed through experimental simulation.Simulation experiments show that compared to DBR and EEDBR algorithms,the QEEDBR algorithm proposed in this paper has strong performance in both network packet delivery rate and end-to-end network latency.At the same time,the QEEDBR algorithm can effectively improve the network lifecycle.When the number of network nodes is 20,the performance is improved by 25.85% and 12.98%,respectively,and the proportion of improvement will further increase as the number of nodes increases. |