Research On Imperfect Information Game Based On Q Learning Algorithm

Posted on:2016-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:C Li

Full Text:PDF

GTID:2348330503986906

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the imperfect information game, players have non-singleton information set which means they have only partial knowledge about the state of game. These make the studies on imperfect information games more complex, competitive and meaningful which attracts lots of domestic and international researchers. Imperfect information game consists of data expression, rules of generator, game tree, and evaluation function. The evaluation function is most important part, which is similar to the human brain, to judge the pros and cons of the current situation, as well as to guide the selection strategy of the smart action. The quality of the evaluation directly reflects the level of the artificial agent's ability. Therefore, it is of great significance to optimize the evaluation function for the imperfect information game.In this reserch, the imperfect information game model is converted into the Partially Observable Markov Decision Processes(Partially Observable Markov Decision Processes, POMDP) for study, which is the extension of Markov decision model(Markov Decision Processes, MDP) in the reinforcement learning. However Q learning algorithm in the reinforcement learning is only suitable for the MDP model, when it is used in the imperfect information gam's evaluation function in the POMDP model, it will cause the confusion of the status, expression of Q value, return delay and other issues. This research explores and studies the two aspects of the state space value function and the strategy space search, and proposes to optimize the evaluation function of the imperfect information game based on an improved Q learning algorithm.The problem that the observation values of the states are identical while the actual states between them are different will emerge in the imperfect information game, which leads the problem of the state confusion. The idea of of combining the consecutive observable states sequence with the eligibility trace is proposed to solve the problem. The problem that the state space in the imperfect information game is so huge, 2 person limit Texas porker's state information is 3.19�1014, which leads the evaluation function cannot be expressed by traditional Q value. The idea of of using artificial neural network to express express the Q value is proposed to solve in the problem; The problem that when the game is not over, we can not take the returns for the current action which leads delay problem. The idea of using UCT(Upper Confidence Bound Applied to Tree) algorithm to calculate returns is proposed to solve the problem.This research has realized Texas porker system and �landlords� system based on the improved Q learning algorithme. The improved Q-learning algorithm can guide the agent to select a reasonable action, which is superior to the traditional evaluation functions.

Keywords/Search Tags:

Q learning algorithm, imperfect information game, artificial neural network, UCT algorithm

PDF Full Text Request

Related items

1	Study Of Evaluation Algorithm In Imperfect Information Game
2	Research And Application Of Computer Game With Imperfect Information
3	Research On Imperfect Information Game Based On Counterfactual Regret Minimization Algorithm
4	Research On Multi-player Imperfect Information Computer Game Based On NFSP And ISMCTS
5	Research On Multi-Player Imperfect Information Computer Game Based On Residual Network And SDMCTS
6	Study Of Game Development Based On Artificial Neural Network And Special Effects Technique Of DirectX
7	Research On Game Algorithm Of Imperfect Information 3D Video Game Based On Deep Reinforcement Learning
8	Research On Imperfect Information Machine Game Based On Deep Reinforcement Learning In 3D Game
9	Research On Streategy Of Imperfect Information Game Based On Real-time Heuristic Search
10	Imperfect Information Games Based On Q-learning