Research And Simulation Of The Cooperative Climate Strategy Based On Multi-agent Q Learning Algorithm

Posted on:2013-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Pu

Full Text:PDF

GTID:2211330371961624

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Recently, without question climate deterioration has become a fact. All countries have paid attention to this problem, also jointed to resolve the challenge of climate deterioration. However, the climate cooperation is conducted among countries which try to pursue individual interests when the fix quantify determines that the purposes of collective behaviors are to obtain maximization of individual interests. Thus, the countries hope to pursue the optimal strategy about how to achieve common goal and make to protect their own interests at the same time in the climate. The research object of this paper is the cooperative climate strategy, through the application of multi-agent reinforcement learning algorithm to simulate the cooperation climate strategy for the optimal policy problem of different countries in the punishment rates.The main contributions of the paper as follows:(1) First, this paper proposes multi-agent Q learning algorithm based on Meta equilibrium, which imbibes the ideal of game theory of NashQ algorithm, to solve the Q value through Meta equilibrium to get the optimal joint strategy in multi-agent system. And this paper gives the theoretical basis of MetaQ algorithm, and analyses the theoretical which is why MetaQ algorithm can get the Pareto optimal solution. The time complexity of MetaQ algorithm is far lower than NashQ algorithm. The grid world game simulation show that MetaQ algorithm has good convergence, and MetaQ algorithm which converges to the optimal algorithm is faster than NashQ algorithm almost 6 times in the experiment.(2) This paper studies the issues of cooperation climate strategy which is defined as a non-cooperative strategy multi-agent system, and gives its investment model and the punishment model. To the research of non-cooperative multi-agent system, it has obvious advantages to game equilibrium strategy, so this paper researches cooperation climate strategy which used Q learning algorithm based Nash and Meta equilibrium. Respectively, this paper simulates the experiments of cooperation climate strategy through NashQ and MetaQ algorithm. Meta equilibrium is a pure strategy, if there is a Pareto optimal solution to equilibrium, Meta equilibrium will be able to solve its optimal solution, and the time complexity of solving the Meta equilibrium point is shorter than Nash equilibrium. Simulation experiments show that the convergence of MetaQ algorithm is faster than NashQ algorithm in cooperation climate strategy when it has the high probability of punishment, and the joint strategy of MetaQ algorithm is more humane and credible than NashQ algorithm in the low punishment rate.

Keywords/Search Tags:

cooperation climate strategy, multi-agent system, reinforcement learning, Q learning algorithm, game theory

PDF Full Text Request

Related items

1	Paper-making Process Controller Design Of Reinforcement Learning
2	Research On Multi Agent Transport System Scheduling Based On Deep Reinforcement Learning
3	Researches On International Political Game Of Climate And China's Strategy
4	Research On Modeling And Control Algorithm Of Denitrification System Of Thermal Power Units Based On Deep Reinforcement Learning
5	Research On Neural Combination Optimization And Reinforcement Learning Methods For Multi-Unmanned Vehicles Ground Search And Rescue Task
6	Research On The Automatic Compliant Assembly Strategy Based On Deep Reinforcement Learning
7	Global Climate Cooperation Countermeasure Analysis And Simulation Research
8	Research On Combined Heat And Power Economic Emission Dispatch System Based On Reinforcement Learning Multi-Objective Differential Evolution Algorithm
9	Operational Feedback Control Based On Reinforcement Learning For Mix Separation Process With Dropout
10	Research And Application Of Adaptive Decision Of Directional Drilling System Based On Reinforcement Learning