Font Size: a A A

A Research Of Reinforcement Learning Based On Manifold Learning

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:J F PanFull Text:PDF
GTID:2428330623468211Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Representation learning and option discovery are two of the biggest challenges in reinforcement learning.In recent years,hierarchical reinforcement learning has made significant progress in solving the "dimensional disaster" problem.Its main idea is to hierarchically decompose a task into subtasks to accelerate the learning and planning process of agents.The current hierarchical decomposition methods are basically designed in advance based on prior knowledge,and are not automatically generated.Especially in the dynamic and complex field,it is difficult to realize the hierarchical structure based on prior knowledge alone.The automatic discovery of subtasks has become a hot topic in the field of hierarchical reinforcement learning.Manifold learning,as an important technique for feature representation and dimensionality reduction,has been popularly studied in pattern recognition.Feature representation is critical not only for pattern recognition tasks but also for solving sequential decision problems with large or continuous state spaces.Therefore,for reinforcement learning algorithms,it is necessary to study a variety of feature learning methods with different attributes,so as to obtain better performance in different situations.For this reason,in view of some problems and shortcomings in the current options discovery method,this thesis mainly researches from the aspects of strategy construction and options construction,and proposes corresponding improved algorithms.The main research work is as follows:First,in response to the options discovery problem,this thesis proposes an improved automatic options discovery algorithm based on laplacian eigenmaps.The algorithm uses PVFs to realize the automatic discovery of options.By defining the concept of eigenpurpose and eigenbehavior,the options discovered from eigenpurposes traverse the principal directions of the state space.The discovered options operate at different time scales and can be easily sequenced,which is helpful for exploration.In addition,the algorithm proposed in this thesis uses the ?-greedy strategy to balance exploration and utilization.The policy chooses between primitive actions and options,which helps the agent to explore the whole state space,thereby improving exploration.Second,the current algorithms for eigenoption discovery cannot be combined with representation learning.In addition,the current eigenoption can only be used in environments where states can be enumerated.In view of this problem,the idea of using representation learning methods to guide the options discovery process is mainly introduced,addressing the aforementioned issues by estimating the DIF model through the SR.Based on the automatic discovery of options,a new reward function is defined exploiting the equivalence between PVFs and successor representation.An algorithm that is capable of discovering eigenoptions while learning representations is proposed.The learned SR replaces the combined Laplacian matrix and is used to discover eigenoptions.It has been proved by experiments that the eigenoptions obtained through the SR approximation the DIF model really help the agent to explore the environment.If we can quickly estimate it,it will be more meaningful to use SR to approximate the DIF model in the environment.
Keywords/Search Tags:Manifold learning, reinforcement learning, options, laplacian eigenmaps, representation learning
PDF Full Text Request
Related items