| Recently, without question climate deterioration has become a fact. All countries have paid attention to this problem, also jointed to resolve the challenge of climate deterioration. However, the climate cooperation is conducted among countries which try to pursue individual interests when the fix quantify determines that the purposes of collective behaviors are to obtain maximization of individual interests. Thus, the countries hope to pursue the optimal strategy about how to achieve common goal and make to protect their own interests at the same time in the climate. The research object of this paper is the cooperative climate strategy, through the application of multi-agent reinforcement learning algorithm to simulate the cooperation climate strategy for the optimal policy problem of different countries in the punishment rates.The main contributions of the paper as follows:(1) First, this paper proposes multi-agent Q learning algorithm based on Meta equilibrium, which imbibes the ideal of game theory of NashQ algorithm, to solve the Q value through Meta equilibrium to get the optimal joint strategy in multi-agent system. And this paper gives the theoretical basis of MetaQ algorithm, and analyses the theoretical which is why MetaQ algorithm can get the Pareto optimal solution. The time complexity of MetaQ algorithm is far lower than NashQ algorithm. The grid world game simulation show that MetaQ algorithm has good convergence, and MetaQ algorithm which converges to the optimal algorithm is faster than NashQ algorithm almost 6 times in the experiment.(2) This paper studies the issues of cooperation climate strategy which is defined as a non-cooperative strategy multi-agent system, and gives its investment model and the punishment model. To the research of non-cooperative multi-agent system, it has obvious advantages to game equilibrium strategy, so this paper researches cooperation climate strategy which used Q learning algorithm based Nash and Meta equilibrium. Respectively, this paper simulates the experiments of cooperation climate strategy through NashQ and MetaQ algorithm. Meta equilibrium is a pure strategy, if there is a Pareto optimal solution to equilibrium, Meta equilibrium will be able to solve its optimal solution, and the time complexity of solving the Meta equilibrium point is shorter than Nash equilibrium. Simulation experiments show that the convergence of MetaQ algorithm is faster than NashQ algorithm in cooperation climate strategy when it has the high probability of punishment, and the joint strategy of MetaQ algorithm is more humane and credible than NashQ algorithm in the low punishment rate. |