Font Size: a A A

Research On Valuation Function And Return Function In Incomplete Information Machine Game

Posted on:2022-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y DongFull Text:PDF
GTID:2480306557968539Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Valuation function and return function are important parts of incomplete information game,valuation function evaluates different strategies in the game,and judges the good strategy by evaluating the results.The reward function is an evaluation of the benefits of the strategy adopted to determine whether the system should be punished or rewarded by the results of the evaluation.The main task of machine game is to use deep learning,intensive learning and other algorithms to help intelligent body analysis of the current situation and future situation,choose the best step.In recent years,the development of machine game technology has been able to basically meet the technical requirements of complete information game,but the research on non-complete information game has yet to be explored.In this thesis,the non-complete information game is the research goal,first design a valuation function based on deep residual network,and then design a monte Carlo tree-based search return function.The main experience of this thesis work innovation is in the following three aspects:(1)In the traditional machine game,deep neural network is generally the same to predict the opponent's action,this thesis,on the basis of the original convolutional neural network,improves the model of the deep neural network valuation algorithm,uses the deep residual network to train the model,further learns the expert's game strategy and makes reference for his own action to predict the opponent's behavior in the game.(2)The bottleneck of Monte Carlo tree search algorithm in incomplete information machine game is analyzed,In the non-complete information game,due to the lack of computational time,the number of simulation evaluation is relatively small,the number of accesses to the child nodes of the root node is too small,simply can not reflect the corresponding distribution of income.Therefore,the algorithm directly uses the breadth-first algorithm to make the child nodes of the root node,and then carries out K-Monte Carlo simulation evaluation for each child node to deal with the non-deterministic distribution of the node's revenue.Experiments show that the algorithm can solve the problem that the simulation number is too few and the randomness is too strong.Therefore,the breadth-first initialization algorithm is proposed.(3)In the game of incomplete information machine,because of time and space,the whole game tree cannot be extended in the unknown to obtain the best results.By reusing the search tree for the previous round,you can then attenuate or reset the data for the previous round.This search tree reuse compensates for the small number of simulation evaluations due to too little time.At the same time,the effect of previously available data on the present is eliminated through attenuation.In view of the above questions,a return function of the search tree reuse Monte Carlo tree search is proposed.
Keywords/Search Tags:Deep reinforcement learning, Incomplete information game, Residual network, MCTS
PDF Full Text Request
Related items