Font Size: a A A

Research On Decision Distribution Modeling In Reinforcement Learning

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2428330647951074Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important research branch in the field of intelligent decision-making.An agent can be trained to perform a task through trial-and-error learning.Combined with neural networks,this technique has recently led to a wide range of successes in learning policy on different tasks,such as defeating the best human player at the game of Go,outperforming humans in 49 Atari games,and overcoming a variety of difficult robotics tasks.However,in reinforcement learning,the agent often needs a lot of interaction with the environment,resulting in a low sample efficiency.Generally speaking,the feedback in reinforcement learning is lagging.Although the environment will give a feedback each step,the cumulative return that needs to be optimized eventually has a certain lag in time,which results in the low sample efficiency in reinforcement learning.We find that decision distribution modeling method can effectively alleviate the problem of feedback lag.This paper studies two effective methods to solve feedback lag,including curriculum learning and imitation learning.By adding decision distribution modeling technology,the performance is improved.The following results have been achieved1.In curriculum reinforcement learning,the agent cannot distinguish between the learned sub-problem and the unlearned sub-problem in previous methods,which results in unnecessary exploration on the learned sub-problem,and the waste of exploration samples.To solve the problem,this paper proposes the decision distribution modeling method in curriculum space to distinguish different sub-problems,so as to control the exploration strength of the agent.Experiments in different environments verify that our method is a generally applicable technique for speeding up curriculum reinforcement learning both in hand-designed curriculum and HER.2.In imitation reinforcement learning,the previous methods cannot cope with environmental changes,resulting in insufficient generalization.To solve this problem,an environment space decision distribution modeling method is proposed,which can generate effective decision models from limited expert data.This method is used to simulate the behavior distribution of pickers in the warehouse dispatch application scenario,aiming at optimizing the dispatch decision.Through offline experiments and online A/B tests,our learning-based approach shows a significant improvement from the traditional expert-knowledge based approach.It is worth mentioning that the picking efficiency improves by approximately 10% using our learned objective function in the real environment,which reaps huge economic benefits.
Keywords/Search Tags:machine learning, deep learning, reinforcement learning, imitation learning
PDF Full Text Request
Related items