Font Size: a A A

Study And Practice On Machine Self Learning Of Game-playing

Posted on:2003-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:J W MoFull Text:PDF
GTID:2168360122960488Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Artificial Intelligence is one of active research area in recent year. Machine-Learning and Game-Playing are two important embranchments of Artificial Intelligence research. The research of Game-Plying is pretty popular, especially the chess program of IBM "Deep Blue" has achieved the level of world champion. But these programs have different default that some need much exercises, some has limitation in intelligence, some study by rote even some carry out by the large search that couldn't avoid the crisis of "combination explosion". So a Game-Playing strategy which is intelligent in deed and has effective learning ability need to study farther. This article combines TD algorithm and BP Neural Network and builds a reinforcement learning in Game-Playing bases on Minimax searching and NegeScout searching therefore carries out a Five-Piece game. This way conquers the shortcoming in using static evaluation function. Practice proves that it is success. The program using the strategy achieves favorable playing level after exercising in rather short time.This article first study how Five-Piece express in computer and discuss how to store board position in computer and distinguish sequence in playing also the change of situation and character of situation.Secondly, this article study Mimimax searching procedure of Game Tree , Alpha-Beta Procedure and optimal. Also study NegeScout which adopt a limited Alpha-Beta window to make sure the score of actual evaluation value. Then searchs a actual evaluation value in the smaller score. Because searching in the smaller score, efficiency can be improved.Thirdly, according to the character of Five-Piece extract some character from situation, give each of them a weight and statistic in all. Then sum the total evaluation value with a linear function. In practice, the first version program with Minimax search and static evaluation function has achieved better-than-average novice. Some skill amateur often loss.Unprecision in static evaluation function makes it's intelligence low down and can't improve it by learning. So we continue studying, design second version by Reinforcement Learning, having self-learning function. Temporal Difference means if this function is quite exact, the different value between neighboring state should be close to 0, so giving later state value to the previous state predict value. This article combines TD algorithm and BP neural network, adopt Three-Layer BP neural network to design a nonlinear evaluation function. The network's input is the number of board position character and the output is the evaluation value. During the learning progress, according to the policy-evaluation of TD, calculates the network error, adjusts weight of neural network constantly, advances veracity of networkevaluation value in learning and improve program's playing ability.Aiming at the slow speed of BP astringency, this article adopts some measure such as initialization to improve net capability. Practice proves that this program has well learning ability, quick astringency speed. It can win another program Happy Five Piece's primary level after 1200 training games. But TD-GAMMON (Tesauro,1995)which is one of application of reinforcement learning won other programs after 300,000 training games.
Keywords/Search Tags:Game-Playing, Reinforcement Learning, Five-Piece Game, Artificial Intelligence
PDF Full Text Request
Related items