| In the complex and dynamically changing underwater acoustic channel,the underwater acoustic(UWA)communication channel transmission attenuation is high.The time-spacefrequency multiple variation characteristic makes the data link very unstable,leading to easy interruptions in data transmission.In order to better select relay node and improve the throughput of the UWA sensor network,this paper constructs a cooperative data transmission network model,and proposes a relay selection scheme based on reinforcement learning(RL)according to the analysis of UWA channel characteristics.(1)Due to the long transmission delay characteristic of the actual UWA channel,there is a problem of inaccurate selection caused by outdated channel state information(CSI)when performing relay selection.The RL based relay selection scheme is constructed by defining the action set,state set and action selection strategy,in which Markov prediction model is used to predict the channel state.Simulation results show that the scheme can obtain higher throughput compared with the relay selection scheme without channel prediction.(2)In this paper,the state and reward functions of RL based UWA cooperative communication are constructed considering the variability channels and long transmission delays.The simulated annealing(SA)algorithm is combined with RL.A fast reinforcement learning(FRL)scheme is proposed.The state is the combination of delay CSI and system mutual information,and the reward is the joint function of selecting different relay node corresponding to system mutual in formation and access delay.In the RL process,the exploration factor of RL is dynamically adjusted by the cooling factor of the SA algorithm.FRI.scheme with a pretraining process is proposed for use in a practical UWA network implementation.Simulation results show that the scheme can select the best cooperative relay node with good channel quality and small access delay,and the proposed SA-FRL scheme has faster convergence speed and higher throughput than the scheme without access delay consideration.In summary,the reinforcement learning based cooperative UWA commmunication scheme proposed in this paper has fast convergence,high throughput,and robustness under the measured UWA channel compared to the cooperative relay selection scheme that is without considering both channel quality and transmission delay. |