Font Size: a A A

Research On Performance Of Actor-Critic-based Fusion Algorithm In Classical Control Problems

Posted on:2020-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:D W WangFull Text:PDF
GTID:2518306104995509Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning has made great achievements in more and more fields,but in most cases,the model trained for a task does not perform well in the new task.The theory of meta learning points out that deep learning model can use prior knowledge to acquire the ability of fast learning in new tasks,and the combination of reinforcement learning is called meta reinforcement learning.Based on Actor-Critic,this paper first explores the performance of Double-Critic model constructed by action value network and state value network in the same task and other similar tasks,and analyzes the results.Then,the Meta-Critic model is constructed by combining the model with the task encoder,and a pre-trained Meta-Model is obtained by using different policy networks to train the model in different tasks.When a new task is given,the action value network in the meta model can be regarded as a prediction network.Before the agent makes a decision,the expectation of the next state value is calculated according to the action value provided by the prediction network and the current policy is updated according to the expectation,so as to explore the new task with the fastest speed and converge to the optimal policy.The objective value of loss function of action value network in the model is given by state value network.This method makes the update process of Meta-Critic model independent of the prediction of action by policy network,and further improves the stability of model adjustment process.Finally,this model and other algorithms are used to test in several new tasks to compare their performance in the new tasks.Finally,the experimental results show that the performance of the model is better in the new task,which shows that the Meta-Critic model has the ability to effectively guide the policy network in the new task by learning from the existing task.At the same time,it is expected that the model can combine the idea of off-line learning algorithm to make full use of the existing data,so as to make the pre-training process faster and more stable.
Keywords/Search Tags:Reinforcement learning, Meta-learning, Prediction network, Prior knowledge
PDF Full Text Request
Related items