Research On Decision Distribution Modeling In Reinforcement Learning

Posted on:2021-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhou

Full Text:PDF

GTID:2428330647951074

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is an important research branch in the field of intelligent decision-making.An agent can be trained to perform a task through trial-and-error learning.Combined with neural networks,this technique has recently led to a wide range of successes in learning policy on different tasks,such as defeating the best human player at the game of Go,outperforming humans in 49 Atari games,and overcoming a variety of difficult robotics tasks.However,in reinforcement learning,the agent often needs a lot of interaction with the environment,resulting in a low sample efficiency.Generally speaking,the feedback in reinforcement learning is lagging.Although the environment will give a feedback each step,the cumulative return that needs to be optimized eventually has a certain lag in time,which results in the low sample efficiency in reinforcement learning.We find that decision distribution modeling method can effectively alleviate the problem of feedback lag.This paper studies two effective methods to solve feedback lag,including curriculum learning and imitation learning.By adding decision distribution modeling technology,the performance is improved.The following results have been achieved1.In curriculum reinforcement learning,the agent cannot distinguish between the learned sub-problem and the unlearned sub-problem in previous methods,which results in unnecessary exploration on the learned sub-problem,and the waste of exploration samples.To solve the problem,this paper proposes the decision distribution modeling method in curriculum space to distinguish different sub-problems,so as to control the exploration strength of the agent.Experiments in different environments verify that our method is a generally applicable technique for speeding up curriculum reinforcement learning both in hand-designed curriculum and HER.2.In imitation reinforcement learning,the previous methods cannot cope with environmental changes,resulting in insufficient generalization.To solve this problem,an environment space decision distribution modeling method is proposed,which can generate effective decision models from limited expert data.This method is used to simulate the behavior distribution of pickers in the warehouse dispatch application scenario,aiming at optimizing the dispatch decision.Through offline experiments and online A/B tests,our learning-based approach shows a significant improvement from the traditional expert-knowledge based approach.It is worth mentioning that the picking efficiency improves by approximately 10% using our learned objective function in the real environment,which reaps huge economic benefits.

Keywords/Search Tags:

machine learning, deep learning, reinforcement learning, imitation learning

PDF Full Text Request

Related items

1	Supervised Reinforcement Learning:methods And Applications
2	Reinforcement Learning Agent Design Based On Deep Perception And Imitation Learning
3	Research On Machine Learning Algorithms Based On Planning Network Model
4	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
5	Research On Reinforcement Learning Method For Game Manipulation Behavior Imitation
6	Optimization For Generative Modeling And Its Applications In Imitation Learning
7	End-To-End Active Tracking System Via Deep Reinforcement Learning
8	Deep Reinforcement Learning For Robotic Cooperation
9	Research On Data Center Network Traffic Scheduling Based On Deep Reinforcement Learning
10	Research On Policy Learning Via Imitation