Font Size: a A A

The Design And Implementation Of Algorithms For Board Games Based On Deep Reinforcement Learning

Posted on:2019-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:H Y DengFull Text:PDF
GTID:2348330545975853Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence,Deep deep reinforcement learning(DRL)has received more and more attention from researchers due to its unique advantages.By combining deep learning(DL)and reinforcement learning(RL)together,DRL not only gives end-to-end learning capabilities in high-dimensional environment to reinforcement learning agents,but also makes it possible for machine learning tasks to further improve model performance without t the absence of training samples.Despite great progress made so far,because of the complexity inherited from both DL and RL,when faced with complex learning tasks such as board games,video games,etc.,DRL still suffers from problems such as unstable training,low sample utilization,difficulty in reproducing results,high hyperparameter sensitivity and difficulty in escaping from local optima.This thesis propose DRL approaches to for board games,which is based on convolutional neural networks and Upper Confidence Bound Applied to Trees(UCT)algorithm,and focuses primarily on dealing with the problems mentioned above,from which consists of the following three aspects:(1)In order to improve the quality of samples in the training process,an effective method of for training board game agents using UCT algorithm's search results is proposed.The algorithm uses UCT to reassess the sampling trajectories of the neural network to correct the neural network's deviations.With the growth of the neural network,it is equivalent to reducing the search space of UCT and improving the efficiency of UCT.(2)The method that combines neural network and Monte Carlo tree search(MCTS)not only requires a large number of training samples,but also is difficult to get rid of the misguided search track caused by deviations in the training process.To solve this problem,a learning algorithm that incorporates bootstrap aggregating algorithm is proposed.The algorithm makes nearly full use of the training data generated from the self-playing and supports multiple neural networks to participate in learning and exploration,which ensures the diversity of the search trajectories,and thereby improving improves the stability of the algorithm and reducing reduces the risk of prematurely getting trapped into local optima.(3)In order to avoid the decrease of the performance of UCT algorithm caused by neural network deviations,and to make full use of all the models trained in the algorithm mentioned above,a UCT algorithm with combined strategies is proposed.The new algorithm not only naturally completes the multithreading modification of the UCT algorithm,but also improves the accuracy of the UCT algorithm through the asynchronous search method.In this thesis,the proposed methods are tested and compared in a series of experiments.Experimental results have confirmed the effectiveness of them.
Keywords/Search Tags:Deep Reinforcement Learning, Monte Carlo Method, Board Games, Ensemble Learning
PDF Full Text Request
Related items