Relational Reinforcement Learning Based On Logical MDPs With Negation

Posted on:2007-06-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z W Song

Full Text:PDF

GTID:1118360272462462

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

It is widely agreed that the intelligent agent should have the ability of learning in order to adapt to the changing of the dynamic environment.Reinforcement Learning(RL) permits the agent to learn policy and capture environment information by trying actions and receiving feedback from the interaction with environment without supervisors.Based on the Markov Decision Process(MDP), many algorithms of RL have been proposed with much progress over the past years.The propositional representations of states of RL,i.e.attribute-values, have also been studied extensively.However,their usefulness in complex domains is limited because of their inability to incorporate relational information about the environment,especially in the domain with objects of various kinds.In order to solve this problem, Relational Reinforcement Learning(RRL) was proposed based on the relational representation.It concerns use of RL in complex domains with the states and actions in relational form.Furthermore,it focuses on the abstract methods because of the huge state space if the states represented by ground atoms.However,the RRL is still not well-understood and the theory is not sufficient, although a number of RRL algorithms have been developed with several preliminary models proposed.This work is based on the Logical MDP(LOMDP). We propose a new model of RRL,which called the Logical MDP with Negation (nLMDP).Based on nLMDP,we also proposeÎ˜(Î»)-learning method and states evolution algorithm.In nLMDP,logical negation is first introduced for describing the environment and the task precisely.By introducing the generating method and the expanding method,a complementary abstract state space can be constructed using the generating method on goal state once and the expanding method several times in turn.They are useful tools for designers to construct the complementary abstract state space in an easy way,where the complementarity of abstract states means that each ground state can only be represented by one abstract state and all the ground states can be represented by all the abstract states.Prototype Action,a super abstraction over the abstract actions,is also introduced into the nLMDP.It means the basic action ways of the environment.The logical negation is also used in the precondition of the prototype action.Based on the set of prototype actions and the complementary abstract state space,the applicable abstract actions of certain abstract state can be obtained easily.Consequently,an nLMDP is defined over a complementary abstract state space and a set of prototype actions.Based on the nLMDP,we proposeÎ˜(Î»)-learning for obtaining the valid substitutions from prototype actions to abstract states and estimating the values of the substitutions.The experiments show that it is an efficient algorithm.For a very complex domain,it is rather difficult for the designer to give a perfect state space and criterion for judgement.Based on the nLMDP andÎ˜(Î»)-learning, an states evolution algorithm is proposed.A complementary abstract state space is emerged while the values of actions and the policy are learned. As a result,only goal state and the prototype actions have been enough for the designer.The experiments show that the agent can catch the essence of the task, and the self-organized states are rational.The main contributions are summarized as follows.1.The logical negation is introduced in the abstract state for describing the environment and the task precisely.The generating method and the expanding method give an easy way to construct the complementary abstract state space for designer 2.The prototype action is proposed and the logical negation is used in the precondition of it.The applicable abstract state space is also defined formally for it being obtained automatically.3.Based on the complementary abstract state space and prototype actions,a new model of RRL,nLMDP,is proposed.4.Î˜(Î»)-learning is proposed for obtaining valid substitutions automatically and estimating values of them.5.The theory and method of states evolution are proposed.The agent learns not only the policy but the abstract state space in the evolution process. This leads to a framework to strengthen the agents' intelligence and simplify the designers' work.

Keywords/Search Tags:

Relational Reinforcement Learning, Logical Markov Decision Processes with Negation, θ(λ)-Learning, States Evolution

PDF Full Text Request

Related items

1	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
2	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
3	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
4	Reinforcement Learning And Its Applications In Navigation And Control Of Mobile Robots
5	Research On Reinforcement Learning Methods For Navigation And Control Of Autonoumous Mobile Robots
6	Research And Application On Relational Reinforcement Learning
7	Reinforcement learning for factored Markov decision processes
8	The Reinforcement Learning Research Based On Internal State In Partially Observable Markov Decision Processes
9	NDP Optimization For Large-scale Markov Systems Based On Performance Potentials-learning
10	Research On Sample-efficient Reinforcement Learning Methods