Study Of Reinforcement Learning Algorithms Based On Value Function Approximation

Posted on:2014-09-17

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X G Chen

Full Text:PDF

GTID:1268330425468267

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, more and more machine learning researchers focus on reinforce-ment learning. On reinforcement learning problems with both a small scale state space and a small scale action space, classic value table-based reinforcement learning algo-rithms are proved by mathematics for convergence, and well evaluated on the experi-mental performance.However, in practice, reinforcement learning problems are usually with large scale and/or continuous state space, even with continuous action space, e.g., steering con-trol problems in automatic driving. This brings the "curse of dimensionality", which challenges the classic table-based reinforcement learning algorithms on both memory space and learning efficiency. A common solution is to combine the classic reinforce-ment learning algorithms with function approximation methods in order to enhance the abstraction ability and generalization ability on state space and action space.In the aspect of function approximation, the main work and contributions of this thesis are as follows:(1) A short introduction to reinforcement learning model is given. Then, sur-veys about the linear function approximated reinforcement learning algorithms and the kernel-based reinforcement learning algorithms are summarized.(2) Temporal Difference (TD) learning family tries to learn a least-squares solu-tion of an approximate Linear Value Function (LVF). However, due to the representive ability of the features in LVF, the predictive error of the learned LVF is bounded by the residual between the optimal value function and the projected optimal value function, where the projection operator is Π=Φ(ΦT DΦ)-1ΦΤ D. We find that the predictive error can be very large if the feature function Φ is not well designed. To deal with this problem, Temporal Difference learning with Piecewise Linear Value Function (PLVF-TD) is proposed to further decrease the error bounds. In PLVF-TD, there are two steps:(i) building the piecewise linear basis for problems with differ-ent dimensions;(ii) learning the parameters via temporal difference learning, whose complexity is O(n). The error bounds are proved to decrease to zero when the size of the piecewise basis goes into infinite.(3) Different from linear approximated reinforcement learning, kernel-based re-inforcement learning has a strong representive ability because of the Representer The-orem. However, the typical kernel-based reinforcement learning algorithms can not satisfy online learning both on accuracy and complexity.To deal with this problem, an algorithm named Online Selective Kernel-based Temporal Difference (OSKTD) learning is proposed. OSKTD includes two online procedures:online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e. a kernel distance-based online spar-sification method) is proposed based on selective ensemble learning, which is com-putationally less complex compared with other sparsification methods. Based on the proposed sparsification method, the sparsified dictionary of samples is constructed on-line by checking if a sample needs to be added to the sparsified dictionary. Also, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function ap-proximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference learning algorithm combined with the gra-dient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n).(4) Real-world problems often require learning algorithms to deal with both con-tinuous state and continuous action spaces in order to control accurately. Thus, contin-uous action space reinforcement learning problems become a hot research topic.To deal with this problem, Actor-Critic methods are combined with the kernel methods, because that Actor-Critic methods are good at dealing with continuous action space problem while the kernel methods are good at dealing with continuous state space problem. Thus, Kernel-based Continuous-action Actor Critic Learning (KCACL) is proposed, where the actor updates the probability of each action based on reward-inaction, and the critic updates the state value function based on the Online Selective Kernel-based Temporal Difference (OSKTD) learning. The empirical results demonstrate the effectiveness of all the proposed algorithms.

Keywords/Search Tags:

Reinforcement Learning, Function Approximation, Kernel Methods, Piece-wise Linear Basis, Selective Ensemble Learning, Local Validity, Actor Critic Learning

PDF Full Text Request

Related items

1	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
2	Research On Non-parametric Function Approximation Methods In Continuous Spaces
3	Reinforcement Learning Research On RoboCup Soccer Keepaway
4	Actor-Critic Algorithms With Continuous Action Spaces
5	Research On Reinforcement Learning Methods Based On Fuzzy Approximation
6	Research On Basis Function Construction Methods In Reinforcement Learning
7	Exdloratory Action Correction Algorithm Based On Actor-Critic
8	Research On The Reinforcement Learning Method And Its Application
9	Research On Target Tracking Algorithm Based On Deep Learning And Reinforcement Learning
10	Research On Value Function Approximation Methods In Reinforcement Learning