Regularization In Least-Squares Temporal Difference:Penalization Versus Bayes

Posted on:2019-01-28

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Yan

Full Text:PDF

GTID:2370330545498030

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

In this paper,a series of methods for the optimization of the Least-Squares Tempo-ral Difference with penalty function are introduced.The mathematical representations and solutions of their models are given.LSTD-2 extends the classical Least-Squares Temporal Difference by additional l2 penalty function to make the solution process more stable;LSTD-l1 can benefit from the sparsity of the coefficient;LSTD-l22 and LSTD-l21 separate the projection stop from the fixed point step adding different penalty con-straints.I also extend its sparsity constraint from the l1 penalty to nonconcave penalty functions SCAD and MCP that perform better in general feature selection.From the Bayesian point of view,this paper proposes to derive their full-condition posterior distri-butions by using the sparse priori hierarchical Bayesian models bLSTD-u and bLSTD-ω.which can be efficiently solved by Gibbs sampling.Meanwhile,the method of estimating regularization parameters based on empirical Bayesian maximum likelihood estimation and full conditional posterior based on gamma priori is given.In addition,the pa-per compares the difference between the penalization method and Bayesian inference method in projection step,and compares the performance of the penalization method and the Bayesian inference method on the two classic reinforcement.learning problems in the numerical experiment part.Experiments have shown that Bayesian inference with sparse priors can achieve similar results to the optimization method with sparse l1 penalty,and far from the simple l2 penalty of not using the information of sparsity.

Keywords/Search Tags:

Reinforcement Learning, Least-Squares Temporal Difference, Regularization, Penalization, Hierarchical Bayes

PDF Full Text Request

Related items

1	Study On Non-permutation Flow Shop Scheduling Problem Based On Deep Temporal Difference Reinforcement Learning Network
2	Research On The Preparation Method Of Quantum State Based On Reinforcement Learning
3	Fuzzy Model Identification Based On Tikhonov Regularization
4	SgRNA Activity Prediction Method Based On Reinforcement Learning
5	Research Of Ocean Observation Path Planning Based On Reinforcement Learning
6	Research On Intelligent Decision Model Based On Deep Reinforcement Learning
7	Research On Quantum System Control Based On Reinforcement Learning
8	Partial Observation Of Memory-based Reinforcement Learning Problems In Markov Decision Process
9	An Empirical Study On Paired Trading Investment Strategy Based On Reinforcement Learning
10	Abstraction-based Reinforcement Learning Algorithms And Its Quantization