Font Size: a A A

Application Of Deep Reinforcement Learning In Video Game Playing

Posted on:2016-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:L W QiuFull Text:PDF
GTID:2308330479494718Subject:Computer technology
Abstract/Summary:PDF Full Text Request
How to let agents learn directly from high-dimensional sensory inputs like speech and vision is one of the long-standing challenges of reinforcement learning(RL) domains. Many successful RL applicationsbased on hand-crafted features combined with linear valuefunctions or policy representations.Obviously, the performance of those systems gravely relies on the quality of the feature representation.Now it is possible to extract high-level features from raw sensory data which is high dimensional because of the advances in deep learning(DL), which leading to breakthroughs in computer vision and speech recognition.These methods use a series of neural network architectures, including convolutional networks,multilayer perceptrons, restricted Boltzmann machines and recurrent neural networks. Also these methods have exploited both supervised and unsupervised learning. It let people curious about whether such techniques could also be beneficial for RL domain.However reinforcement learning presents several challenges from a deep learning perspective. Firstly, most successful deep learning applications have required large amounts of hand-labelled training data. On the other hand, RL algorithms must be able to learn from a scalar rewardsignal that is usually sparse, noisy and delayed. Another issue is that most deeplearning algorithms assume the data samples to be independent, while in reinforcement learning onetypically encounters sequences of highly correlated states. Furthermore, in RL the data distribution changes as the algorithm learns new behaviours, which can be problematic for deep learningmethods that assume a fixed underlying distribution.This paper tries to use the following methodtoovercome these problems. Firstly, we design a type of deep neural network architecture according to our task, and this network can learn successfully control policies from raw video data in complex RL environments. Also,we use model combination method which combines eight types of model that use difference architecture of deep neural network to steady model’s action policy and improve model’s performance. Furthermore, the network istrained with an improved Q-learning algorithm that use a special sample method that samples training data from a lot of historical experiences,and it use mini-batch L-BFGS to update the weights.Experiments shows that, after training with the improved Q-Learning method, RL model which combines deep neural network can exact high level feature and cansuccessful learn control policies with a steady manner. The performance on the video game of this model compared to traditional RL model and NFQ model has significantly improved and the game score of our model in four out of six test game outperformance human player. Game performance after model combinationing were better than the performance of a single model.
Keywords/Search Tags:Reinforement Learning, Deep Learning, Model Combination, Video Game
PDF Full Text Request
Related items