Font Size: a A A

Research On Optimized Methods Of Planning Within Reinforcement Learning

Posted on:2014-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:H K SunFull Text:PDF
GTID:2248330398465514Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important kind of machine learning methods that hasbeen widely applied in robotics, economics, industrial manufacturing and games and so on.Reinforcement learning is a process of state from the environment mapping to actions andexpects the largest accumulated reward of the actions from the environment.Reinforcement learning might be divided into two basic processes: learning and planning.Learning refers to that the agent interacts with the environment directly and update valuefunctions in order to improve the policy based on the acquired direct experience. Planningmeans learning in environment model to produce simulated experience, and the simulatedexperience could be used to update value functions in order to improve the policy.“Curse of dimensionality” and the slow convergence are common but seriousproblems when dealing with reinforcement learning problems in large state space. Aimingat solving these problems, this paper proposes two optimized algorithms in themodel-known and model-free tasks respectively from the perspective of promotingconvergence performance of planning. The main research contents are concluded asfollows:ⅰ. An optimized value iteration based on topological sequence backups namedVI-TS is given, VI-TS aims to accelerate the convergence of classical value iteration andenhance the stability of value iteration. VI-TS solves the problem by constructing the directgraph of task model, and decomposing the strong connected components from the graph,finally VI-TS do the computation of states’ value function in every componet in thetopological order. The efficiency of planning has been improved because times of iterationneeded and the dimensionality of state space decrease after the decomposing. VI-TSperfoms a phase of heuristic seach and eliminates provably sub-optimal actions during the search, so VI-TS could be widely used in planning problems. This paper will show theconvergence proof of VI-TS in theory and analyse the effiency and applicability through aseries of classical AI planning experiments.ⅱ. An optimized Dyna architecture algorithm with prioritized sweeping namedDyna-PS is proposed. The goal of Dyna-PS is to speed up the convergence of traditionalDyna architecture. Dyna-PS integrates the idea of prioritized sweeping into the planningpart of Dyna algorithm, the novel algorithm updates value functions according to priorityfunctions during the iteration and omits the unrelated and insignificant states’ updatingwhich is necessary in the traditional value iteration and policy iteration. Dyna-PSsignificantly enhances the effiency of Dyna algorithm while improving the performace ofthe planning part. This paper will also show the proof of Dyna-PS convergence in theoryand analyse the effiency by a series of classical AI planning experiments.
Keywords/Search Tags:reinforcement learning, planning, topological sequence, VI-TS, prioritizedsweeping, Dyna-PS
PDF Full Text Request
Related items