Font Size: a A A

Deep Reinforcement Learning Based Hybrid Precoding For Mmwave Massive MIMO System

Posted on:2022-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q S WangFull Text:PDF
GTID:2518306740996749Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Millimeter wave(mm Wave)communication and massive multiple-input multiple-output(MIMO)have been considered as key techniques to improve the capacity and spectral efficiency of wireless communication system.Based on the channel state information(CSI)obtained at the transmitter,hybrid precoding can be utilized to obtain high directional gain,so as to compensate the high path loss of mm Wave signal,improve the system performance,and ensure the reliability of data transmission.However,traditional hybrid precoding algorithms widely used in mm Wave massive MIMO systems still suffer from the high computational complexity and local-optimal performance.In recent years,deep reinforcement learning(DRL)has been regarded as an important technology for the realization of artificial intelligence(AI),and has demonstrated a powerful ability in dealing with physical layer wireless communication problems.Therefore,this thesis investigates the DRL-based hybrid precoding algorithms for mm Wave massive MIMO systems.In this thesis,we first study and summarize the theoretical basis of reinforcement learning(RL)and DRL.We introduces the basic concepts of RL,gives the types of problems that can be solved by RL.Two mainstreams of reinforcement learning are introduced,i.e.,the value-based and the policy-based RL.The DRL algorithm based on neural networks are also elaborated,which uses neural network to estimate the value function or policy gradient function,so that the convergence of the agent can be accelerated.Moreover,the concept of multi-agent DRL and corresponding algorithms are introduced in detail,including the Team-Q learning,distributed Q learning,Nash-Q learning,loosely coupled Q learning,and etc.Then,for the mm Wave massive MIMO downlink transmission systems,two DRL-based hybrid precoding algorithms are proposed for the perfect and imperfect CSI cases,respectively.For the perfect CSI scenario,the proposed algorithm utilizes the manifold optimization(MO)algorithm to obtain the analog precoder.Based on the obtained analog precoder,DDPG algorithm is adopted to learn the digital precoder and analog combiner.The DRL agent takes the digital precoder and analog combiner of previous learning iteration as the state,the digital precoder and analog combiner of current learning iteration as the action,and the spectral efficiency as the reward.The digital combiner is obtained based on the minimum mean square error(MMSE)criterion.For the imperfect CSI scenario,due to the channel estimation error,an upper bound of the spectral efficiency expectation is derived,and the reward of the DRL agent is modified as this upper bound.The digital precoder is obtained based on the modified MMSE criterion.Simulations show that the proposed DRL-based algorithm achieves higher spectral efficiency and lower bit error rate than the traditional hybrid precoding methods.Moreover,compared with the deep supervised learning,the proposed algorithm can automatically adapt to the changed environment and does not require the large account of pre-defined training data,and is more robust.Next,we focus on the hybrid precoding of the multi-user multiple-input single-output(MU-MISO)downlink system.Assuming the BS can obtain the perfect CSI,a DRL-based algorithm is proposed to design the hybrid precoder.The proposed algorithm employs the DDPG algorithm to learn the analog precoder at the BS.At each learning iteration,the zero forcing(ZF)precoder is first calculated using the effective channel vectors.Then,the obtained ZF precoder is used as the digital precoder to eliminate the inter-user interference.For the design of the analog precoder,considering its non-concave constant modulus constraint,the DDPG agent takes the analog precoder of previous learning iteration as the state,the analog precoder of current learning iteration as the action,and the user sum rate as the reward so as to maximize the sum-rate performance and reduce the time consumption.Simulations show that the proposed DRL-based hybrid precoding algorithm for MU-MISO system can achieve higher sum rate than the benchmarks,while the time consumption for convergence is decreased.Finally,in the MU-MISO hybrid precoding systems,the single-agent DRL has the problem of exploration efficiency,and is possible to fall into a local-optimal solution.Thus,we propose a multi-agent deep reinforcement learning(MADRL)based hybrid precoding algorithm.The proposed algorithm adopts the actor-critic architecture,which consists of multiple distributed actor networks,multiple improved experience replay buffers with priority,a single centralized critic network,and a single centralized reward prediction network.Multiple actor networks explore the environment separately and output their own actions.The centralized critic network fully coordinates the exploration of all these actors based on the state-action pairs from the replay buffers.In this way,the knowledge learned by a certain actor can be shared with others so that the convergence can be accelerated.Meanwhile,a priority is defined for each state-action pair,and is stored in the replay buffers.Then,the experiences are sampled based on the priority to improve the sampling efficiency.Moreover,to deal with the insufficient amount of information in the reward function,the centralized reward prediction network is used to modify the reward value fed back by the environment at each learning iteration to further accelerate the convergence.
Keywords/Search Tags:Massive MIMO, mmWave communication, hybrid precoding, DRL, multi-agent system
PDF Full Text Request
Related items