Font Size: a A A

Reinforcement Learning Control Methods Based On Prior Knowledge Model:Studies And Implementation

Posted on:2020-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:W X WeiFull Text:PDF
GTID:2518306548493654Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since there is no prior knowledge in the training process,as the complexity of the task increases,the training time also increases,which limits the task scenario of the application of the reinforcement learning technology.Therefore,it is of great theoretical significance to provide training results of similar tasks as a priori knowledge models to new reinforcement learning tasks,to solve the problem of reinforcement learning dependence on a large number of training samples,and to improve the usability and universality of reinforcement learning algorithms.This paper carries out the following research and innovation:(1)propose a strategy migration method based on prior knowledge model;(2)propose a reinforcement learning algorithm framework based on prior knowledge model;(3)propose a task similarity calculation method based on dynamic model.The prior knowledge model is designed based on the training characteristics of model-based and model-free reinforcement learning algorithms.The prior knowledge model migrates the model-based control strategy to the neural network through imitation learning,and acts as the initial strategy of the model-free algorithm to optimize the strategy.For the same source and target task,this paper realizes the algorithm framework based on the prior knowledge model,integrating the advantages of the model-based and model-free reinforcement learning algorithms.The algorithm framework provides a robust method for the combined use of model-based and model-free algorithms,and achieves excellent performance in the intelligent body forward attitude learning task in the simulation environment.Compared with random policy initialization,the algorithm framework of this paper achieves up to 3 times the sample efficiency.For the scenarios of low-dimensional source tasks and high-dimensional target tasks,this paper further proposes a task similarity calculation method based on dynamic models.In the algorithm framework of this paper,the dynamic models of different tasks are implemented based on the neural network of the same hidden layer structure.By calculating the degree of difference in the hidden layer weight matrix,the degree of similarity between the source task and the dynamic model of the target task can be compared,and the task similarity between the two can be obtained.The task similarity can be used as the combined weight of the low-dimensional prior knowledge model,which ensures the reliability of knowledge transfer between similar tasks.Finally,in the forward attitude learning experiment,compared with random policy initialization,the algorithm framework of this paper achieves up to 2 times the sample efficiency.
Keywords/Search Tags:Reinforcement Learning, Imitation Learning, Prior Knowledge Model
PDF Full Text Request
Related items