Font Size: a A A

Research On Reinforcement Learning Based On Hidden Space Modeling

Posted on:2022-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiuFull Text:PDF
GTID:2518306533477314Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The model-based reinforcement learning algorithm can use the known environment model or the learned environment model to improve the learning efficiency of the policy network,so as to improve the utilization efficiency of sample data.At the same time,model-based reinforcement learning algorithms can also use models for policy planning,so as to make precise and forward-looking decisions.If the model-based reinforcement learning algorithm models directly on the high-dimensional environment state,it needs to reconstruct the high-dimensional features,which will produce a lot of errors and is not conducive to the environment modeling.Therefore,the most advanced model-based reinforcement learning algorithms in recent years mostly model the environment on the hidden space representation of the environment state,and then use the learned environment model to train the policy network or to carry out policy planning.This method not only improves the efficiency of environment modeling,but also improves the robustness of the algorithm.However,most of the current model-based reinforcement learning algorithms use a simple encoder to obtain the hidden space representation of one-step environment state,which can not fully obtain the useful information of environment modeling.At the same time,the performance of the existing policy planning algorithm in continuous action space task is not good,it needs a lot of simulation calculation to get better action,and there are still some shortcomings in performance and efficiency.Based on the Dreamer algorithm,this thesis studies these two problems and proposes effective solutions.The main contents of this thesis are as follows:1.Research on encoder based on gated recurrent unit.At present,most of the model-based reinforcement learning algorithms use simple encoders to obtain environmental state information,which has the problem of insufficient access to the environment state information.Therefore,this thesis designs an encoder based on recurrent unit.The encoder first uses the encoding network to encode the continuous,single-step environmental state,and then uses the gated recurrent unit to calculate the sequence data composed of the encoding output corresponding to each step of the environmental state,and finally obtains the hidden space representation corresponding to the current environmental state.The hidden space representation not only contains the static information corresponding to the current environmental state,but also contains the dynamic information that the current environmental state cannot reflect,which is more conducive to the algorithm's modeling of the environment.The experimental results show that the encoder based on the gated recurrent unit proposed in this thesis can effectively improve the efficiency and accuracy of the algorithm for modeling in high-dimensional environmental conditions.2.Research on policy planning algorithm based on tree search and Rollout.At present,the existing policy planning algorithms perform well in discrete action space tasks,but they are some shortcomings in continuous action space tasks.To solve this problem,this thesis proposed a policy planning algorithm based on tree search and Rollout.The policy planning algorithm combines the ideas of Monte Carlo tree search and Rollout algorithm,and can use the learned environment model to carry out policy search on continuous action space tasks.The experimental results show that under the premise that the algorithm can establishes an accurate environmental model,the policy planning algorithm proposed in this thesis can effectively improve the efficiency of the model-based reinforcement learning algorithm to explore the environment,thereby improving the overall learning speed and final performance of the algorithm.This thesis has 36 figures,5 tables and 100 references...
Keywords/Search Tags:Reinforcement learning, Model-based reinforcement learning, Gated recurrent unit, Encoder, Policy planning
PDF Full Text Request
Related items