Font Size: a A A

Trust Cooperation Mechanisms And Multi-agent Reinforcement Learning Algorithms For Sequential Social Dilemmas

Posted on:2024-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:S GaoFull Text:PDF
GTID:2568307067492934Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-Agent Reinforcement Learning(MARL)is one of the hot research topics in the field of machine learning.In recent years,it has achieved a series of key results in tasks dominated by fully cooperative games.However,in more general non-cooperative game scenarios,self-interested agents may adopt non-cooperative behaviors,leading to conflicts between their own interests and collective interests,and thus falling into deficient equilibria.These types of problems are usually modeled as Sequential Social Dilemmas(SSD)and have been extensively studied in the field of MARL.However,how to promote the emergence of cooperation among multiple agents and ultimately solve the problem of SSD remains a major challenge in current research.To this end,the thesis attempts to optimize existing MARL algorithms based on mechanism design methods,so that self-interested agents can emerge in cooperation and jointly improve their own welfare,in order to better solve the problem of sequential social dilemmas.The main contributions of this thesis are as follows:· Firstly,in order to promote the emergence of cooperation among multiple agents,the Authority-Trust Cooperation Mechanism(ATCM)is proposed in Chapter 3,which introduces a centralized Authority Structure by changing the game structure.This mechanism establishes an authoritative structure in a multi-agent system,making cooperation among multiple agents possible.Moreover,based on reward shaping,the mechanism constructs Authority Trust,which can promote cooperation among multiple agents while ensuring the effective operation of the Authority Structure.Therefore,this mechanism can be used as a general paradigm for solving SSD and has important theoretical significance for solving SSD problems in MARL.· Then,the thesis proposes the Learning to Incentivize and Sanction Cooperative Agents(LISCA)algorithm in Chapter 3,which introduces an Incentive and Sanction Planning Agent as a centralized Authority Structure to learn a reasonable Incentive/Sanction Allocation Policy and provides appropriate levels of incentives and sanctions to other agents as intrinsic rewards.The algorithm employs Incentive/Sanction Reward Shaping to implement Authority Trust,allowing the Authority Structure to correct non-cooperative behaviors and encourage cooperative behaviors among the agents,thus facilitating the emergence of cooperation in multi-agent systems.However,this algorithm has some shortcomings in solving more complex SSD problems,and to some extent,lacks explainability.· In response to the limitations of the introduced work in Chapter 3,the ExternalityBased Authority-Trust Cooperation Mechanism(EBATCM)is proposed in Chapter 4.This mechanism incorporates the Externality Theory into MARL to measure the impact of each agent’s behavior on social welfare in SSD,and quantifies the uncompensated interdependence effects among self-interested agents in the multi-agent system,providing a solid theoretical foundation for analyzing and explaining the causes of SSD.Furthermore,to better balance the conflict between individual and collective interests,this mechanism offers a feasible solution to internalize externalities and eliminate uncompensated interdependence effects among self-interested agents.· Finally,this thesis proposes the Learning Optimal Pigovian Tax(LOPT)algorithm in Chapter 4 as an improvement to the LISCA algorithm.This algorithm implements the EBATCM and demonstrates that the proposed Optimal Pigovian Tax/Subsidy Reward Shaping can achieve approximately Optimal Pigovian Tax,thereby internalizing approximate externalities among agents in multi-agent systems while realizing Externality-Based Authority Trust.Moreover,the algorithm,with better explainability,effectively eliminates uncompensated interdependence among multiple self-interested agents,thereby facilitating long-term stable mutual cooperation among multiple agents and providing an effective solution to SSD in MARL.The effectiveness of the proposed algorithms was verified in SSD benchmarks with different difficulty settings.The experimental results demonstrate that the proposed algorithms not only enable cooperation to emerge in SSD but also promote and stabilize long-term mutual cooperation among multiple self-interested agents,effectively suppressing the occurrence of defection behaviors by egoistic agents,thereby greatly improving social welfare.These results reflect the superiority of the proposed algorithm in addressing SSD in MARL.
Keywords/Search Tags:Multi-Agent Reinforcement Learning, Sequential Social Dilemmas, Reward Shaping, Learning to Cooperate
PDF Full Text Request
Related items