Inverse Reinforcement Learning Under Average Reward Criterion

Posted on:2014-11-15

Degree:Master

Type:Thesis

Country:China

Candidate:Z R Tao

Full Text:PDF

GTID:2298330422990620

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

In reinforcement learning, reward function is set mainly based on experience.So itâ€™s difficult to guarantee its optimality. Apprenticeship learning also needs tostrike the reward function. Reverse reinforcement learning could construct thepotential reward function by learning demo trajectory or expertâ€™s strategy. Itprovides an effective method for constructing the reward function while avoidingthe subjectivity of experience. Therefore, itâ€™s quite meaningful to research inversereinforcement learning.So far, inverse reinforcement learning mainly focused on Markov DecisionProcess(MDP) under the discount criteria. However, inverse reinforcement learningunder average criterion has almost not been studied. Therefore, this paper studiesreverse reinforcement learning under average criterion to solve the problem ofconstructing reward function. This paper mainly includes two sections. On one hand,based on sensitivity idea, we get a sensitivity-based inverse reinforcement learningalgorithms in small state space environment, which is obtained by analyzing theperformance difference formula under average criterion. On the other hand, in largestate space or in the condition of indescribable reward function, we describe rewardfunction through a linear combination of characteristic basis functions. Combining itwith the maximum marginal idea, zero-sum game thinking and natural gradient idea,we can achieve three kinds of inverse reinforcement learning under average criterion.They are respectively inverse reinforcement learning of the maximum marginal,zero-sum thinking and natural gradient.In this dissertation, the four algorithms are realized in both a grid world andunmanned vehicle simulation platform. The effectiveness of the algorithm is mainlyverified in three sections, including the state numbers of wrong action betweencalculated strategy and expert strategy, the difference of average reward betweencalculated strategy and expert strategy and the value of reward function. In addition,it analyzes how much the four algorithms rely on the expertâ€™s strategy andenvironment and then compares the advantages and disadvantages of the algorithms.

Keywords/Search Tags:

reverse reinforcement learning, policy, MDP, feature basis function, average rewards

PDF Full Text Request

Related items

1	Research On Reninforcement Learning Network Algorithm With Self-adaptive Basis Function
2	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
3	Research On Reinforcement Strategy For Reward Sparseness Proble
4	Analysis And Research On Off-policy Algorithms In Reinforcement Learning
5	Research On Basis Function Construction Methods In Reinforcement Learning
6	Policy Iteration Reinforcement Learning Based On Geodesic Gaussian Kernel
7	Research On The Reinforcement Learning Method And Its Application
8	Automatic basis function construction for reinforcement learning and approximate dynamic programming
9	Study Of Reinforcement Learning Algorithms Based On Value Function Approximation
10	Research On Regularized Least Squares Policy Evaluation Algorithms In Reinforcement Learning