Font Size: a A A

Research On Imperfect Information Game Based On Q Learning Algorithm

Posted on:2016-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2348330503986906Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the imperfect information game, players have non-singleton information set which means they have only partial knowledge about the state of game. These make the studies on imperfect information games more complex, competitive and meaningful which attracts lots of domestic and international researchers. Imperfect information game consists of data expression, rules of generator, game tree, and evaluation function. The evaluation function is most important part, which is similar to the human brain, to judge the pros and cons of the current situation, as well as to guide the selection strategy of the smart action. The quality of the evaluation directly reflects the level of the artificial agent's ability. Therefore, it is of great significance to optimize the evaluation function for the imperfect information game.In this reserch, the imperfect information game model is converted into the Partially Observable Markov Decision Processes(Partially Observable Markov Decision Processes, POMDP) for study, which is the extension of Markov decision model(Markov Decision Processes, MDP) in the reinforcement learning. However Q learning algorithm in the reinforcement learning is only suitable for the MDP model, when it is used in the imperfect information gam's evaluation function in the POMDP model, it will cause the confusion of the status, expression of Q value, return delay and other issues. This research explores and studies the two aspects of the state space value function and the strategy space search, and proposes to optimize the evaluation function of the imperfect information game based on an improved Q learning algorithm.The problem that the observation values of the states are identical while the actual states between them are different will emerge in the imperfect information game, which leads the problem of the state confusion. The idea of of combining the consecutive observable states sequence with the eligibility trace is proposed to solve the problem. The problem that the state space in the imperfect information game is so huge, 2 person limit Texas porker's state information is 3.19×1014, which leads the evaluation function cannot be expressed by traditional Q value. The idea of of using artificial neural network to express express the Q value is proposed to solve in the problem; The problem that when the game is not over, we can not take the returns for the current action which leads delay problem. The idea of using UCT(Upper Confidence Bound Applied to Tree) algorithm to calculate returns is proposed to solve the problem.This research has realized Texas porker system and “landlords” system based on the improved Q learning algorithme. The improved Q-learning algorithm can guide the agent to select a reasonable action, which is superior to the traditional evaluation functions.
Keywords/Search Tags:Q learning algorithm, imperfect information game, artificial neural network, UCT algorithm
PDF Full Text Request
Related items