Multiple input multiple output(MIMO)transmission can greatly improve the capacity of wireless channel,so it has been widely used in recent years.In the operation of MIMO network,reasonable base station selection and pre-coding joint optimization scheme are needed.When the base station and user scale are large,traditional centralized processing is difficult to implement due to the increase of computational complexity Therefore,the implementation scheme of distributed multi-agent cooperation can make the multi-agent of base station implement information interaction in computing resource allocation scheme,so as to achieve or approach the optimal effect of the whole network In addition,the algorithm architecture based on the traditional statistical model,when the statistical feature time changes,the algorithm performance will be seriously divorced from the system status quo,so people think of using reinforcement learning method,so that the algorithm has the ability to adapt to the change of statistical feature In this paper,the distributed reinforcement learning multi-agent method is used to jointly optimize the user base station selection and the precoding matrix of the base station transmitting antenna for the whole wireless resource allocation problem of MIMO network.The work of this paper is divided into three parts:Firstly,in this part,the beamforming method of multi-cell downlink MIMO system based on distributed multi-agent cooperation is studied.The main work includes:1)Under the condition of limited rated transmitting power of each antenna,the optimization problem of user base station selection and power allocation is established to maximize the total rate of the system;2)Based on the principle of statistical selection,the user’s base station selection scheme is given;3)Given the user base station selection scheme,for user MIMO power distribution problem,a distributed multi-agent cooperative architecture is adopted,and a parallel cooperative algorithm is proposed.Each agent algorithm is based on water injection algorithm,and the initial solution of power distribution is obtained.Each agent can exchange information by broadcasting the state information of the user channel and the intermediate result of power distribution,and calculate the power distribution of the user in the base station by using the broadcast information of other base stations The optimization problem for each base station agent is non-convex,which is approximated as a convex optimization problem by using the sine and cosine algorithm(SCA)and solved by the dual gradient method;4)Simulation results show that the performance of this algorithm is better than other classical algorithms.Secondly,in this part,the power allocation reinforcement learning method of multi-cell downlink MIMO system based on distributed multi-agent cooperation is studied.The main work includes:1)The power allocation optimization problem is established to maximize the total rate of the system under the condition that the rated transmission power of each antenna is limited;2)A multi-agent reinforcement learning algorithm for centralized training and distributed execution is designed for all transmitters to calculate their power allocation scheme in real time.The algorithm is based on DDPG algorithm LSTM algorithm can be trained in continuous action-state space to solve the problem of dynamic wireless resource allocation in traditional MIMO communication system.Its performance is better than or equal to that of traditional distributed algorithm or centralized algorithm;3)Compared with the distributed training scheme,this architecture saves computing resources,has faster convergence speed,and avoids the instability caused by distributed training.Simulation results show that the performance of this algorithm is better than other classical optimization algorithms.Thirdly,this part studies the base station selection and precoding method of multi-cell downlink MIMO system based on distributed reinforcement learning multi-agent cooperation.The main work includes:1)the base station selection and precoding optimization problem is established in order to maximize the total rate of the system under the condition of limited rated transmission power of each base station;2)Design forced zero precoding matrix;3)A centralized training and distributed execution framework is designed for all transmitters to calculate their joint base station allocation and precoding matrix design scheme in real time,and a distributed reinforcement learning algorithm is proposed based on LSTM algorithm and DDPG algorithm to solve the optimization problem The state of reinforcement learning algorithm is designed to add user resource allocation vector to user signal-to-noise ratio and user received interference,remove the information interaction between agents,and each agent uses local observation information of their own community to submit to the training center for unified training,reducing the burden of information transmission and delay;4)Simulation results show that the performance of this algorithm is better than other classical algorithms,and the validity of the algorithm is verified. |