Font Size: a A A

Reinforcement Learning Based On Spectral Graph Theory

Posted on:2013-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Q ZhuFull Text:PDF
GTID:1228330392454401Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
As an effective method of solving the sequential decision-making problems,reinforcement learning encounters the curse of dimensionality when it is applied inlarge-scale or continuous spaces problems. Solving the curse of dimensionality andimproving the efficiency of the algorithm are the main problems of reinforcementlearning at the present stage. In recent years, as the mathematical tool which candiscover the topological structure of the high-dimensional data space, spectral graphtheory have been applied into these fields like complex network, image and vision,manifold learning and have achieved great success. Therefore, introducing spectralgraph theory into reinforcement learning has very important research value.In order to improve the efficiency of the algorithm, the dissertation mainlystudies how to apply spectral graph theory in following three areas: hierarchicalreinforcement learning, heuristic reinforcement learning based on manifold distance,and transfer learning for reinforcement learning. Firstly, a new subtask strategycalculating method and two modified task decomposition methods are proposed inhierarchical reinforcement learning. Then, aiming at these tasks of searching targetlocation, a framework of heuristic reinforcement learning based on distance metriclearning is established. Under above established framework, the most efficientLaplacian Eigenmap is used in the three aspects, namely Reward Shaping, heuristicstrategy selection and heuristic Dyna planning. At the same time, three categories ofheuristic reinforcement learning algorithms are put forward. At last, against theshortage of basis function transfer based on the spectral graph theory, a hybrid transfermethod integrating basis function with subtask optimal polices is designed in transferlearning for reinforcement learning. The main contributions of this dissertationinclude:1. The Option method includes task decomposition and subtask strategycalculating. In task decomposition, these existing Option methods based on spectralgraph partition need confirming the subtask number by hand and have a limitedapplication range. The dissertation analyzes the reason and puts forward two modifiedOption automatic decomposition algorithms through introducing multiple spectralclustering and the Eigengap method. In subtask strategy calculating, the existingmethods generally refer it as a new reinforcement learning problem. The dissertationuses the fact that Laplacian Eigenmap preserves the local topology structure of state space, and comes up with a new subtask strategy calculating method, namely virtualvalue function method.2. For these learning tasks based on the target location, generalized distanceoften is used as heuristic function in these areas that are the design of heuristic rewardfunction, the selection of heuristic action and heuristic Dyna planning. How todefinite the generalized distance according to the properties and structures of the tasksis the key. For these tasks whose value functions are discontinuous in Euclidean spacebut continuous in some manifold, a framework of heuristic reinforcement learningbased on the distance metric learning is built.3. The design method of heuristic reward function contains two types, namelygeneralized distance method and abstract model method. Following the idea ofgeneralized distance method, under the established framework of heuristicreinforcement learning, the dissertation uses the simplest Laplacian Eigenmap basedon spectral graph theory to get a new design method of heuristic reward function.Based on abstract model method and above two modified Option automaticdecomposition algorithms, the dissertation proposes two improved methods ofdesigning reward function which both can adaptively decompose subtask potentialfunction.4. Under the established framework of heuristic reinforcement learning, aimingat the strategy selection and Dyna planning of reinforcement learning, a new heuristicaction selection method and an improved Dyna-Q planning algorithm are put forward.The above two methods can speed initial learning performance of Q learning5. For scaling up state space transfer underlying the proto-value functionsframework based on spectral graph, only some basis functions corresponding to thesmaller eigenvalues are transferred effectively. However, the few effective basisfunctions will result in some error approximation of value functions in target task. Thereason that result in some error approximation of value functions is analyzed and ahybrid transfer method integrating basis function transfer with subtask optimal policestransfer is designed. The proposed hybrid transfer method can get directly optimalpolicies of some states, reduce iterations and the minimum number of the basisfunction needed to approximate the value functions. The method is suitable for scalingup state space transfer task with hierarchical control structures.There are four elements consisting of model, reward, value function and policy inreinforcement learning. Centering four elements, the dissertation studies several algorithms based on spectral graph theory, analyzes their application ranges andcomputational complexities, verifies their effectiveness and applicabilities bysimulation experiments.
Keywords/Search Tags:Hierarchical Reinforcement Learning, Heuristic Reinforcement Learning, Transfer Learning for Reinforcement Learning, Spectral Graph Theory, MultipleSpectral Graph Partition, Distance Metric Learning, Laplacian Eigenmap
PDF Full Text Request
Related items