Font Size: a A A

Relational Reinforcement Learning Based On Logical MDPs With Negation

Posted on:2007-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W SongFull Text:PDF
GTID:1118360272462462Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It is widely agreed that the intelligent agent should have the ability of learning in order to adapt to the changing of the dynamic environment.Reinforcement Learning(RL) permits the agent to learn policy and capture environment information by trying actions and receiving feedback from the interaction with environment without supervisors.Based on the Markov Decision Process(MDP), many algorithms of RL have been proposed with much progress over the past years.The propositional representations of states of RL,i.e.attribute-values, have also been studied extensively.However,their usefulness in complex domains is limited because of their inability to incorporate relational information about the environment,especially in the domain with objects of various kinds.In order to solve this problem, Relational Reinforcement Learning(RRL) was proposed based on the relational representation.It concerns use of RL in complex domains with the states and actions in relational form.Furthermore,it focuses on the abstract methods because of the huge state space if the states represented by ground atoms.However,the RRL is still not well-understood and the theory is not sufficient, although a number of RRL algorithms have been developed with several preliminary models proposed.This work is based on the Logical MDP(LOMDP). We propose a new model of RRL,which called the Logical MDP with Negation (nLMDP).Based on nLMDP,we also proposeΘ(λ)-learning method and states evolution algorithm.In nLMDP,logical negation is first introduced for describing the environment and the task precisely.By introducing the generating method and the expanding method,a complementary abstract state space can be constructed using the generating method on goal state once and the expanding method several times in turn.They are useful tools for designers to construct the complementary abstract state space in an easy way,where the complementarity of abstract states means that each ground state can only be represented by one abstract state and all the ground states can be represented by all the abstract states.Prototype Action,a super abstraction over the abstract actions,is also introduced into the nLMDP.It means the basic action ways of the environment.The logical negation is also used in the precondition of the prototype action.Based on the set of prototype actions and the complementary abstract state space,the applicable abstract actions of certain abstract state can be obtained easily.Consequently,an nLMDP is defined over a complementary abstract state space and a set of prototype actions.Based on the nLMDP,we proposeΘ(λ)-learning for obtaining the valid substitutions from prototype actions to abstract states and estimating the values of the substitutions.The experiments show that it is an efficient algorithm.For a very complex domain,it is rather difficult for the designer to give a perfect state space and criterion for judgement.Based on the nLMDP andΘ(λ)-learning, an states evolution algorithm is proposed.A complementary abstract state space is emerged while the values of actions and the policy are learned. As a result,only goal state and the prototype actions have been enough for the designer.The experiments show that the agent can catch the essence of the task, and the self-organized states are rational.The main contributions are summarized as follows.1.The logical negation is introduced in the abstract state for describing the environment and the task precisely.The generating method and the expanding method give an easy way to construct the complementary abstract state space for designer 2.The prototype action is proposed and the logical negation is used in the precondition of it.The applicable abstract state space is also defined formally for it being obtained automatically.3.Based on the complementary abstract state space and prototype actions,a new model of RRL,nLMDP,is proposed.4.Θ(λ)-learning is proposed for obtaining valid substitutions automatically and estimating values of them.5.The theory and method of states evolution are proposed.The agent learns not only the policy but the abstract state space in the evolution process. This leads to a framework to strengthen the agents' intelligence and simplify the designers' work.
Keywords/Search Tags:Relational Reinforcement Learning, Logical Markov Decision Processes with Negation, θ(λ)-Learning, States Evolution
PDF Full Text Request
Related items