Font Size: a A A

Learning to cooperate

Posted on:2004-08-25Degree:Ph.DType:Dissertation
University:The University of RochesterCandidate:Zhu, ShenghuoFull Text:PDF
GTID:1458390011957223Subject:Computer Science
Abstract/Summary:
Game theory is not only useful to understand the performance of human and autonomous game players, but it is also widely employed to solve resource allocation problems in distributed decision making systems. These distributed systems are mostly referred to as multi-agent systems. Reinforcement learning is a promising technique for learning agents to adapt their own strategies in such systems. Most existing reinforcement learning algorithms are designed from a single-agent's perspective and for simplicity assume the environment is stationary, i.e., the distribution of the utility of each state-action pair does not change. The predominant approaches to game playing in those settings assume that opponents' behaviors are stationary. However, in a more realistic model of multi-agent systems, the agents are continually adapting their own strategies owing to different utilities at different times. Because of non-stationarity, multi-agent systems are more sensitive to the trade-off between exploitation, which uses the best strategy so far, and exploration, which tries to find better strategies. Exploration is especially important in changing circumstances.; Cooperation usually enables agents to receive a higher payoff than non-cooperative ones. This research is to explore the cooperative opportunities in unknown games. A hill-climbing exploration approach is proposed for agents to take their opponents' responses into consideration, and maximize the payoffs by gradually adapting their strategy to their opponents' behaviors in iterated games. Simulations show that the agents can efficiently learn to cooperate with or compete against each other as the situation demands. Also, the agents are able to tolerate noise in environments and exploit weak opponents.; Assuming that the utility of each state-action pair is a stochastic process allows us to describe the trade-off dilemma as a Brownian bandit problem to formalize recency-based exploration bonus in non-stationary environments. To demonstrate the performance of exploration bonus, we build agents using Q-learning algorithm with a smoothed best response dynamics. The simulations show that the agents can efficiently adapt to changes in their opponents' behaviors whereas the same algorithm, using Boltzmann exploration, can not adapt. This work focuses on typical simultaneous games that represent phenomena of competition or cooperation in multi-agent environments, such as work-and-shirk game.
Keywords/Search Tags:Game, Agents, Multi-agent
Related items