Font Size: a A A

Research On Intelligent Exploration Algorithm Of Reinforcement Learning

Posted on:2021-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q J YangFull Text:PDF
GTID:2428330623967879Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Exploration and exploitation are the main characteristics of reinforcement learning that are different from other learning methods.Reinforcement learning algorithms need exploration algorithms to balance exploration and exploitation during the learning process.An efficient exploration algorithm helps the agent converge to an excellent strategy faster during the interaction with the environment,and improves the efficiency of training.This thesis studies familiarity of actions or state actions and uses it in exploration algorithms.Propose three different exploration algorithms for three different environments.This thesis mainly studies the exploration algorithm from the following aspects.(1)Summarize the advantages and disadvantages of the current four types of explo-ration algorithms: disturbed exploration,optimism under uncertainty exploration,curios-ity exploration,and Thompson sampling,affirm the idea of optimism under uncertainty exploration.And based on the state action familiarity exploration signal,propose an ex-ploration algorithm.(2)Analyze the relationship between mean and variance change value of the model.Propose an exploration algorithm with the variance change value as familiarity.Carry out experimental comparison and analysis in a variety of multi-armed bandit environ-ments.The experiment proves that this exploration algorithm can obtain a higher total re-ward under the limited number of arm pulls than the classic exploration algorithm.Pro-pose the deduction and proof of the lower bound regret value in the two armed Bernoulli Bandit.(3)Expand the action selection algorithm based on the variance change value to MDP.First improve the VTD algorithm.The improved VTD algorithm can estimate the variance change value of the state action.Then this thesis uses the improved VTD algorithm to estimate the variance change value to design an action selection algorithm that can solve the exploration problem in the MDP environment.Experiment with classic algorithms in a hard exploration environment.And the algorithm can obtain higher rewards under a limited number of episodes.(4)Propose an action selection algorithm based on the idea of distillation network technology in complex environments.Estimating the variance of state action in a com-plex environment is difficult,so this thesis redesigns the evaluation method of state action familiarity.The method of distilling network model is used to construct a method to esti-mate the familiarity of state action,and the action selection algorithm is designed based on this.In the experimental environment,the influence of different parameters on the experi-mental results is compared,and it is also compared with other algorithms in different Gym environments.From the experimental results,the algorithm can obtain better rewards in some environment that are in need of exploration.
Keywords/Search Tags:Reinforcement Learning, Exploration Algorithm, Markov Decision Process, Variance Change Value, Familiarity of State Action, Optimism under Uncer-tainty
PDF Full Text Request
Related items