Font Size: a A A

The Research Of Evaluation Method In Connect6 Based On BP-TD Learning

Posted on:2010-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiFull Text:PDF
GTID:2218330371499529Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Both space search ability and evaluation function are the most important factors to measure the level of game playing. Connect6 has the simple rules, however, it has complex state spaces and large average branches of game tree, which limits the max depth of searching in the game tree, and makes the evaluation more important. Evaluation is one of the most difficult problems to tackle in game playing, the accuracy of evaluation usually determine the quality of the strategy for the next move.Because of the particularity of connect6, in order to make TD learning more efficient, this thesis proposes a two-steps move selected strategy:the first periodic policy is to allocate the weights for alternative moves based on their evaluations and the degree of network confidence, the next move is selected by using a probabilistic approach, moves with higher weight values are assigned higher probabilities, but every move is assigned a nonzero probability. The second periodic policy takes minimax tree search algorithm to select the next move. The combination of these two policies makes TDConn6 have the strategy of exploitation and exploration.Because of the particularity of connect6, in order to make TD learning more efficient, this thesis proposes a two-steps move selected strategy:the first periodic policy is to allocate the weights for alternative moves based on their evaluations and the degree of network confidence, the next move is selected by using a probabilistic approach, moves with higher weight values are assigned higher probabilities, but every move is assigned a nonzero probability. The second periodic policy takes minimax tree search algorithm to select the next move. The combination of these two policies makes TDConn6 have the strategy of exploitation and exploration.Taking the above-mentioned method and strategy, TDConn6 is implemented in this thesis, it learns from 'zero knowledge', and plays 1000 times with NEUConn6 and NEU6Star respectively after trained 30000 times, and the results are 64.7% and 80.5%, which prove that the method and the two-steps strategy are effective and practical.
Keywords/Search Tags:Computer Game Of Connect6, Evaluation Function, TD Algorithm, BP Neural Network, Two-steps move selected strategy
PDF Full Text Request
Related items