Font Size: a A A

Research On Generative Model Based World Model Establishment And Intelligent Decision-making Algorithm

Posted on:2021-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2428330614450176Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Intelligent decision-making has always been one of the key technologies of robots.The current application of robots is facing the challenge of shifting from structured factory environments and tasks to complex daily life environments such as household,office buildings,roads,and fields,and more challenging tasks.This shift puts a higher demand on the intelligence of robots.The existing intelligent algorithm heavily relies on the researchers to model the environment and the body in advance and always needs to re-model when facing new problems.Therefore,the algorithm can not generilize and costs huge labor,which cannot meet the large demand for intelligence in the future industry.This subject aims to explore the general description and solutions of intelligent decision-making problems.First,based on the commonly used POMDP process in reinforcement learning,a general mathematical description of intelligent decision-making problems is established,and by analyzing it,the intelligent decision-making algorithm is equivalent to the extraction and utilization of information.We use information theory to analyze the distribution of information in the environment,and finally get a general framework for solving intelligent decision problems based on the concept of the world model.The world model is divided into two processes: perception abstraction and state prediction according to the type of information extracted.Based on the Mo Jo Co simulation platform,five typical visual control tasks are selected as the verification platform for this paper.Secondly,the relationship between the perceptual abstraction process and the generative model is derived.The internal constraints in the POMDP process are used to transform the perceptual abstraction process into a generation problem,and the perceptual abstraction process is implemented based on the variational autoencoder.The information constraints in the optimization objective is analysised from a theoretical viewpoint,and use the flow model to replace the prior distribution of the variational autoencoder to achieve better static information extraction.The methods' ability to extract static information of the environment was verified on two typical tasks.Thirdly,the relationship between the state prediction process and the generation model is derived.The internal constraints in the POMDP process are used to transform the state prediction process into a sequence generation problem,and the state predictionprocess is implemented based on the recurrent neural network.Three models of RAR,RVAR,and RVAE are proposed according to the different node forms of the belief variables and the trajectory optimization methods.The methods' ability to ability to extract and predict environmental dynamic information on two typical tasks.Finally,mimicing the human intelligent decision-making method,an imaginative learning method based on the world model and the actor-critic framework is proposed,and the learned world model is used to generate human-like,interpretable intelligent decisions.A systematic verification of the framework is conducted in all five typical simulation tasks.And the results shows the effectiveness of the algorithm,and compared with other reinforcement learning methods can greatly improve the sample efficiency.We also explore training agent in offline mode,which proves that the feedback process of data collection in the framework is crucial to the performance of the agent.
Keywords/Search Tags:World Model, Generative Model, Reinforcement Learning, Deep Learning, Representation Learning, Intelligent Decision-Making
PDF Full Text Request
Related items