Font Size: a A A

Research On Parallel Reinforcement Learning

Posted on:2013-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X D YangFull Text:PDF
GTID:2248330371493528Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important machine learning method that has been extremely effective in fields such as robotics, economics, industrial manufacturing and games. However, many off-the-shelf reinforcement learning algorithms have poor scalability to some extent. They become increasingly expensive as the state space of the problem increases in size and have difficulty in dealing with problems with continuous state space. Also, slow convergence is another problem of reinforcement learning in real-world application.Aiming at the "curse of dimensionality" problem and the slow convergence problem of reinforcement learning in large state space or continuous state space, several parallel reinforcement learning methods are proposed. The main research content is concluded as follows:ⅰ. A scalable parallel reinforcement learning method is proposed on the basis of state space division and intelligent scheduling. In this method, the learning problem with large state space or continuous state space is decomposed into smaller subproblems so that each subproblem can be learned in parallel. During the learning process, an adaptive intelligent scheduling algorithm is used to select the subproblems to be learned. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. Once the subproblems are completed, their partial results will be combined to obtain the desired result. Also, the convergence of Q-learning based on the proposed method is proved.ⅱ. To improve the efficiency of time credit assignment in the online learning tasks with delayed reward, and to accelerate the convergence speed of reinforcement learning algorithms with eligibility traces, a parallel reinforcement learning framework is proposed. Some optimizations of the framework are given. The proposed learning framework takes full advantage of the inherent parallelism found in reinforcement learning algorithms with eligibility traces, and multiple computing nodes are used together to take charge of the value function and eligibility traces.iii. In practical application, especially in problems with large state space, the computational time to converge to an almost optimal policy is too much to claim the E3algorithm is efficient, as shown by the given theory bounds. In this paper, we show how the algorithm can be improved by substituting the exploration phase by a parallel sampling method, via multiple agents exploring in parallel, more suitable for the problem with large state space. And in the exploitation phase, the learned experience can be reused to improve the efficiency of updating the value function, so as to speed up the convergence.
Keywords/Search Tags:parallel reinforcement learning, state space decomposition, eligibility trace, parallel sampling, learning experience reuse
PDF Full Text Request
Related items