Font Size: a A A

Research And Application Of Self Learning Strategy-Value-Risk Model

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:R F ChenFull Text:PDF
GTID:2428330620464280Subject:Engineering
Abstract/Summary:PDF Full Text Request
Artificial intelligence(AI)is a very important research field in computer science.It studies how to make machines perceive the environment like human beings and perform some complex work instead of human beings.Computer game is an important research direction of artificial intelligence.Researchers have found that the combination of deep learning,reinforcement learning and self-learning can produce a very strong computer game system.AlphaGo is such a computer game system that conquers one of the most difficult chess games of human beings.When we study the game system,we found that the algorithm did not give the risk faced by the current players in the game when it gave the best behavior;at present,the algorithm is suitable for the alternate hand of both the enemy and us,and has not yet given the situation of dealing with the continuous hand,which is lack of generality.The machine game algorithm with risk prediction and generality has more practical significance and application prospects.Based on the above background,inspired by the AlphaGo algorithm,this thesis studies strategy network,value network and Monte Carlo tree search algorithm,proposes self-learning strategyvalue-risk network model algorithm,and designs a simulation system to verify the algorithm.The main contents and innovations of this thesis are as follows:(1)A risk network model algorithm is presented.In view of the insufficient explanatory power of recommendation behavior in the game system.This thesis puts forward a risk network model,uses deep learning method,extracts the characteristics of the game situation through convolution neural network,and then trains by data label after multi-layer full connection.Various risk parameters are included in the data label.After a large amount of data training,the risk network model can predict various risk estimates based on the input game situation,and explain the reason for recommending a behavior from the risk perspective.(2)A self-learning strategy-value-risk network model algorithm is presented.For training strategy network,value network and risk network need a long training time and a lot of computer resources.This thesis puts forward that the strategy network,value network and risk network are merged into a strategy-value-risk network.After the merged network,it can be formed at one time,saving training time and reducing training costs.In this thesis,a self-learning strategy-value-risk network is constructed by using Monte Carlo tree and strategy-value-risk network,which makes the algorithm independent of external game experience data and solves the problem of collecting sample data.(3)A new Monte Carlo tree search algorithm is presented.This algorithm improves the original algorithm because it only supports two players in the game alternately and lacks generality.In this thesis,a Monte Carlo tree search algorithm supporting continuous behavior is proposed,which adds node identification to the original algorithm and improves node expansion and reverse propagation.The improved algorithm can be applied to a wider game scenario and is no longer constrained by alternating shots.(4)Design and implement a simple system to verify self-learning strategy-value-risk network algorithm.This thesis builds a simulation system based on the poker game scenario,and designs and implements the overall structure and each model.
Keywords/Search Tags:self-learing, reinforcement learing, strategy-value-risk model, monte carlo tree search with continuous actions
PDF Full Text Request
Related items