Research On Least-Squares Policy Iteration Algorithms

Posted on:2015-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhou

Full Text:PDF

GTID:2268330428498524

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is a kind of machine learning methods which maps the stateto the action in order to obtain the maximum cumulative rewards through interacting withthe environment. In reinforcement learning problems with large-scale and continuous stateor action spaces, the approximate reinforcement learning methods are proposed by usingthe function approximation methods to fit the policy. Least-squares policy iteration is astate-of-the-art approximate reinforcement learning method. The least-squaresapproximation can extract more useful information from the samples and can be applied tothe online algorithms effectively. This article focuses on the online least-squares policyiteration algorithm, the following extensions have been done, and the correspondingalgorithms have been proposed:i. For the insufficient use of sample data of online least-squares policy iterationalgorithm, a batch least-squares policy iteration (BLSPI) algorithm is proposed. Thealgorithm generates samples and stores them online, then reuse these samples to update thecontrol policy, which can effectively utilize prior experience, improve the experimentalutilization rate and the convergence speed.ii. For the single form and the lack of automaticity of step-size parameter of LSPEalgorithm, an autonomous batch least-squares policy iteration (ABLSPI) algorithm isproposed. The algorithm combines the fixed point method of step-size parameterevaluation to adjust the step-size parameter according to the environment and the currentpolicy dynamically, which can further enhance the experimental utilization rate, improvethe convergence speed and the stability of learning process.iii. BLSPI algorithm will be extended to the continuous action spaces, for the slowconvergence speed when facing problems with large feature numbers and dimension of thestate space, a batch least-squares policy iteration in continuous action spaces with fast feature selection (CABLSPI-FFS) algorithm is proposed. The algorithm uses the binarysearch approach to reduce the complexity of the action search, and selects the state featureautomatically to estimate the policy which reduces the dimension of the state space and thecomputational complexity and improves the efficiency of the algorithm.

Keywords/Search Tags:

reinforcement learning, least-squares policy iteration, batch updating, fixedpoint of step-size estimation, feature selection

PDF Full Text Request

Related items

1	Research On Reinforcement Learning In Continuous Spaces
2	Reinforcement Learning Algorithm Study Based On ESN
3	Research On Motion Control Of Mobile Robots Based On Reinforcement Learning
4	Research On Policy Iteration Algorithm Within Bayesian Reinforcement Learning
5	Analysis And Research On Off-policy Algorithms In Reinforcement Learning
6	Efficient approximate policy iteration methods for sequential decision making in reinforcement learning
7	Policy Iteration Reinforcement Learning Based On Geodesic Gaussian Kernel
8	Research On Regularized Least Squares Policy Evaluation Algorithms In Reinforcement Learning
9	Research On Off-policy Reinforcement Learning Algorithm
10	Recursive Least-squares Reinforcement Learning Based On An Improved Extreme Learning Machine