The demands for wireless communication data transmission speed and coverage are growing by the day,thanks to the development and use of big data and artificial intelligence.The deployment of dense mm Wave cellular networks has become a trend in the development of wireless communication networks in order to fulfill demand.Dense network deployment,on the other hand,will result in significant inter-cell interference and a reduction in wireless transmission performance.By coordinating the construction of beams through data exchange across base stations,cooperative beamforming technology in a multi-cell cellular network system may efficiently prevent inter-cell interference.The traditional cooperative beamforming algorithm relies on complete channel state information(CSI),but because obtaining complete CSI in the real-world communication environment is difficult,the algorithm’s performance is limited.Deep reinforcement learning(DRL)has recently been demonstrated to be capable of selecting optimum policies in complicated dynamic situations.As a result,the multi-cell cooperative beamforming optimization issue of millimeter-wave multiuser-multiple input single output(MU-MISO)communication systems is studied using deep reinforcement learning in this research.The paper’s major research effort is contain three sections:1.A DRL-based dual-model joint beamforming and power control algorithm(JPCBF)is proposed to address the problem that traditional beamforming algorithms rely heavily on the quality of channel state information and are not suitable for rapidly changing actual systems.To tackle the joint optimization issue,the algorithm uses the information exchange protocol for the base station to interpret the environmental data and creates a dual-model structure with centralized training and distributed execution.Each base station begins by gathering local samples and uploading them to the cloud.The cloud employs a deep Q network(DQN)to design beamforming after receiving the local samples uploaded by the base station.DDPG substitutes DQN to address the power management problem and overcomes the problem that DQN is not suited for continuous variables.Following the completion of the cloud model training,it is disseminated to all base stations for distributed execution in order to acquire local samples.According to simulation results,the spectral efficiency of the approach is superior than the traditional beamforming algorithm and the DQN-based hybrid beamforming and power control algorithm.2.A federated multi-agent scheme is investigated in light of the fact that the centralized training scheme requires the base station to send local environment information to the cloud server,resulting in high transmission overhead,whereas the distributed training scheme has low transmission overhead but suffers from slow or non-convergence.In this strategy,each base station operates as an autonomous agent,using local data to train its own local model.Then,each base station transmits the local model to the cloud server for federated aggregation after a specific number of training times.Lastly,the cloud will aggregate the data.For a fresh round of local training,the derived global model is transmitted to all base stations.The convergence of the method is investigated in this study where the loss function is both a convex and a non-convex function.The simulation results suggest that using a federated multi-agent architecture can increase the system’s performance while also speeding up the convergence process.3.A federated learning-deep deterministic policy gradient(FL-DDPG)hybrid beamforming technique is suggested based on a federated multi-agent framework.In this algorithm,each base station first uses the limited information exchange protocol to understand the environment,then uses the DDPG algorithm to design analog precoding that meets the constraints based on the obtained environmental information to eliminate inter-cell and intra-cell interference.And finally,perform federated aggregation of each base station’s local models to improve model accuracy and accelerate convergence.A beamforming technique based on orthogonal matching pursuit(OMP)is employed in the training stage to create a fixed number of samples in the event of complete CSI to speed up the convergence speed in the early stage,and the model only retains the samples of hybrid beamforming which satisfy the power and constant to the experience pool.This study examines the suggested algorithm’s computational complexity and transmission overhead.The simulation results demonstrate that the method enhances spectral efficiency and convergence time while lowering transmission overhead when compared to other baseline approaches. |