Font Size: a A A

Researches On The Curse Of Dimensionality In Reinforcement Learning

Posted on:2011-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q C YanFull Text:PDF
GTID:2178360305976161Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
To solve the"curse of dimensionality"(that is, the states space will grow exponentially in the number of features) and low convergence speed of two common and serious problem in Reinforcement learning, in this paper, we will solve these two problems from the reward function, hierarchical reinforcement learning and function estimation different angles, proposed a heuristic reward function method based on hierarchical reinforcement learning and a reinforcement learning algorithm based on neural network. Meanwhile, in these theoretical foundations, we developed Tetris, Mountain car and Grid World and other experimental platform. Through experiments and analysis of experimental data, we further validate the correctness and effectiveness of the algorithm.The main research results are concluded as follows:(1) Proposed a heuristic reward function based on hierarchical reinforcement learning algorithm and give a theoretical proof of the algorithm's convergence. Heuristic reward will be applied to hierarchical reinforcement learning sub-tasks, greatly increased the learning speed of the Agent. The algorithm not only can solve the"curse of dimensionality"problem, but also can speed up the convergence speed of the task.(2) We developed the Tetris game experiment platform. In this platform, we apply the algorithm of heuristic reward function based on hierarchical reinforcement learning, the experimental results show that: the algorithm not only significantly reduce the environmental state space, solve the"curse of dimensionality"problem in a certain extent, but also has good convergence speed.(3) To solve the"curse of dimensionality"problem, we also put forward a new algorithm called QL-BP, which applied the neural network to the reinforcement learning. This algorithm uses the powerful function approximation ability of the neural network, so that learning systems do not need to traverse each state or state-action pairs can be given in the correct state-value function V(s) or action-value function Q(s, a), reduce the space complexity significantly.(4) For the QL-BP algorithm has disadvantages like larger shock, slow convergence in the early time and will appear the phenomenon of over-fitting problems later in the study (because of the big error of the experimental samples), we presents an improved QL-BP algorithm. Experiments results show that the improved QL-BP algorithm not only has better convergence speed in the early time, but also eliminate the phenomenon of over-fitting significantly in the later.(5) Developed the Mountain car and Grid World experiment platform. We applied QL-BP algorithm, the improved QL-BP algorithm to this two experimental platforms. Experimental results show that the two algorithms is better than Q(λ) algorithm in terms of space complexity and can solve the"curse of dimensionality"problem in a certain extent.
Keywords/Search Tags:reinforcement learning, hierarchical reinforcement learning, neural networks, Tetris, "curse of dimensionality"
PDF Full Text Request
Related items