Font Size: a A A

Research On Least-Squares Policy Iteration Algorithms

Posted on:2015-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhouFull Text:PDF
GTID:2268330428498524Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is a kind of machine learning methods which maps the stateto the action in order to obtain the maximum cumulative rewards through interacting withthe environment. In reinforcement learning problems with large-scale and continuous stateor action spaces, the approximate reinforcement learning methods are proposed by usingthe function approximation methods to fit the policy. Least-squares policy iteration is astate-of-the-art approximate reinforcement learning method. The least-squaresapproximation can extract more useful information from the samples and can be applied tothe online algorithms effectively. This article focuses on the online least-squares policyiteration algorithm, the following extensions have been done, and the correspondingalgorithms have been proposed:i. For the insufficient use of sample data of online least-squares policy iterationalgorithm, a batch least-squares policy iteration (BLSPI) algorithm is proposed. Thealgorithm generates samples and stores them online, then reuse these samples to update thecontrol policy, which can effectively utilize prior experience, improve the experimentalutilization rate and the convergence speed.ii. For the single form and the lack of automaticity of step-size parameter of LSPEalgorithm, an autonomous batch least-squares policy iteration (ABLSPI) algorithm isproposed. The algorithm combines the fixed point method of step-size parameterevaluation to adjust the step-size parameter according to the environment and the currentpolicy dynamically, which can further enhance the experimental utilization rate, improvethe convergence speed and the stability of learning process.iii. BLSPI algorithm will be extended to the continuous action spaces, for the slowconvergence speed when facing problems with large feature numbers and dimension of thestate space, a batch least-squares policy iteration in continuous action spaces with fast feature selection (CABLSPI-FFS) algorithm is proposed. The algorithm uses the binarysearch approach to reduce the complexity of the action search, and selects the state featureautomatically to estimate the policy which reduces the dimension of the state space and thecomputational complexity and improves the efficiency of the algorithm.
Keywords/Search Tags:reinforcement learning, least-squares policy iteration, batch updating, fixedpoint of step-size estimation, feature selection
PDF Full Text Request
Related items