Font Size: a A A

Regularization In Least-Squares Temporal Difference:Penalization Versus Bayes

Posted on:2019-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:B Y YanFull Text:PDF
GTID:2370330545498030Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In this paper,a series of methods for the optimization of the Least-Squares Tempo-ral Difference with penalty function are introduced.The mathematical representations and solutions of their models are given.LSTD-2 extends the classical Least-Squares Temporal Difference by additional l2 penalty function to make the solution process more stable;LSTD-l1 can benefit from the sparsity of the coefficient;LSTD-l22 and LSTD-l21 separate the projection stop from the fixed point step adding different penalty con-straints.I also extend its sparsity constraint from the l1 penalty to nonconcave penalty functions SCAD and MCP that perform better in general feature selection.From the Bayesian point of view,this paper proposes to derive their full-condition posterior distri-butions by using the sparse priori hierarchical Bayesian models bLSTD-u and bLSTD-ω.which can be efficiently solved by Gibbs sampling.Meanwhile,the method of estimating regularization parameters based on empirical Bayesian maximum likelihood estimation and full conditional posterior based on gamma priori is given.In addition,the pa-per compares the difference between the penalization method and Bayesian inference method in projection step,and compares the performance of the penalization method and the Bayesian inference method on the two classic reinforcement.learning problems in the numerical experiment part.Experiments have shown that Bayesian inference with sparse priors can achieve similar results to the optimization method with sparse l1 penalty,and far from the simple l2 penalty of not using the information of sparsity.
Keywords/Search Tags:Reinforcement Learning, Least-Squares Temporal Difference, Regularization, Penalization, Hierarchical Bayes
PDF Full Text Request
Related items