Research On Optimized Methods Of Planning Within Reinforcement Learning

Posted on:2014-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:H K Sun

Full Text:PDF

GTID:2248330398465514

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is an important kind of machine learning methods that hasbeen widely applied in robotics, economics, industrial manufacturing and games and so on.Reinforcement learning is a process of state from the environment mapping to actions andexpects the largest accumulated reward of the actions from the environment.Reinforcement learning might be divided into two basic processes: learning and planning.Learning refers to that the agent interacts with the environment directly and update valuefunctions in order to improve the policy based on the acquired direct experience. Planningmeans learning in environment model to produce simulated experience, and the simulatedexperience could be used to update value functions in order to improve the policy.“Curse of dimensionality” and the slow convergence are common but seriousproblems when dealing with reinforcement learning problems in large state space. Aimingat solving these problems, this paper proposes two optimized algorithms in themodel-known and model-free tasks respectively from the perspective of promotingconvergence performance of planning. The main research contents are concluded asfollows:ⅰ. An optimized value iteration based on topological sequence backups namedVI-TS is given, VI-TS aims to accelerate the convergence of classical value iteration andenhance the stability of value iteration. VI-TS solves the problem by constructing the directgraph of task model, and decomposing the strong connected components from the graph,finally VI-TS do the computation of states’ value function in every componet in thetopological order. The efficiency of planning has been improved because times of iterationneeded and the dimensionality of state space decrease after the decomposing. VI-TSperfoms a phase of heuristic seach and eliminates provably sub-optimal actions during the search, so VI-TS could be widely used in planning problems. This paper will show theconvergence proof of VI-TS in theory and analyse the effiency and applicability through aseries of classical AI planning experiments.ⅱ. An optimized Dyna architecture algorithm with prioritized sweeping namedDyna-PS is proposed. The goal of Dyna-PS is to speed up the convergence of traditionalDyna architecture. Dyna-PS integrates the idea of prioritized sweeping into the planningpart of Dyna algorithm, the novel algorithm updates value functions according to priorityfunctions during the iteration and omits the unrelated and insignificant states’ updatingwhich is necessary in the traditional value iteration and policy iteration. Dyna-PSsignificantly enhances the effiency of Dyna algorithm while improving the performace ofthe planning part. This paper will also show the proof of Dyna-PS convergence in theoryand analyse the effiency by a series of classical AI planning experiments.

Keywords/Search Tags:

reinforcement learning, planning, topological sequence, VI-TS, prioritizedsweeping, Dyna-PS

PDF Full Text Request

Related items

1	Research On Path Planning Of Mobile Robot Based On Reinforcement Learning
2	Research On Reinforcement Learning Oriented Model Learning Algorithms
3	Study On Emergency Escape Route Planning Based On Reinforcement Learning
4	Research On Adaptive Software Model Of Mobile Robot Based On Reinforcement Learning
5	Research On Multi-AGVs Path Planning And Scheduling Technology Based On Reinforcement Learning
6	Research On Machine Learning Algorithms Based On Planning Network Model
7	Research On Nonparametric Value Function Approximation Reinforcement Learning
8	Asynchronous Deep Reinforcement Learning With Attention Mechanisms
9	Research On Reinforcement Learning Based On Gaussian Process Regression
10	Research On AGV Storage Path Planning Based On Reinforcement Learning