Research On Parallel Reinforcement Learning

Posted on:2013-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:X D Yang

Full Text:PDF

GTID:2248330371493528

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is an important machine learning method that has been extremely effective in fields such as robotics, economics, industrial manufacturing and games. However, many off-the-shelf reinforcement learning algorithms have poor scalability to some extent. They become increasingly expensive as the state space of the problem increases in size and have difficulty in dealing with problems with continuous state space. Also, slow convergence is another problem of reinforcement learning in real-world application.Aiming at the "curse of dimensionality" problem and the slow convergence problem of reinforcement learning in large state space or continuous state space, several parallel reinforcement learning methods are proposed. The main research content is concluded as follows:ⅰ. A scalable parallel reinforcement learning method is proposed on the basis of state space division and intelligent scheduling. In this method, the learning problem with large state space or continuous state space is decomposed into smaller subproblems so that each subproblem can be learned in parallel. During the learning process, an adaptive intelligent scheduling algorithm is used to select the subproblems to be learned. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. Once the subproblems are completed, their partial results will be combined to obtain the desired result. Also, the convergence of Q-learning based on the proposed method is proved.ⅱ. To improve the efficiency of time credit assignment in the online learning tasks with delayed reward, and to accelerate the convergence speed of reinforcement learning algorithms with eligibility traces, a parallel reinforcement learning framework is proposed. Some optimizations of the framework are given. The proposed learning framework takes full advantage of the inherent parallelism found in reinforcement learning algorithms with eligibility traces, and multiple computing nodes are used together to take charge of the value function and eligibility traces.iii. In practical application, especially in problems with large state space, the computational time to converge to an almost optimal policy is too much to claim the E3algorithm is efficient, as shown by the given theory bounds. In this paper, we show how the algorithm can be improved by substituting the exploration phase by a parallel sampling method, via multiple agents exploring in parallel, more suitable for the problem with large state space. And in the exploitation phase, the learned experience can be reused to improve the efficiency of updating the value function, so as to speed up the convergence.

Keywords/Search Tags:

parallel reinforcement learning, state space decomposition, eligibility trace, parallel sampling, learning experience reuse

PDF Full Text Request

Related items

1	Neural Network-Based Research On Reinforcement Learning In Continuous State Space
2	Research On Reinforcement Learning Based On Value Function Approximation And State Space Decomposition
3	Research On Experience Replay In Deep Reinforcement Learning
4	Research On Experience Replay Method For Deep Reinforcement Learning
5	Research On Learning Of The Optimal Policy In Largescale State Space
6	Research On Security Deep Reinforcement Learning Based On Experiences
7	Research And Implementation Of Agent Continuous Control Technology Based On Distributed Reinforcement Learning
8	The Decomposition And Reconstruction Of Complex Environment In Reinforcement Learning
9	Learning state and action space hierarchies for reinforcement learning using action -dependent partitioning
10	Research On Reinforcement Learning Oriented Model Learning Algorithms