Font Size: a A A

Deconfounded Environment Reconstruction For Reinforcement Learning Based Recommendation

Posted on:2020-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:W J ShangFull Text:PDF
GTID:2428330575958133Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning(RL),an important research area in machine learning,aims at searching the best policy model for decision making.The training of the pol-icy by reinforcement learning depends on a large scale of trial-and-errors in a specific environment.It adopts the mechanism of "exploration and exploitation" to achievethe goal of optimizing policy.In many real-world applications,however,the policy training in the real environment faces many obstacles which could not meet the re-quirement of large scale sampling.Environment reconstruction from the past data is,thus,an appealing way to land reinforcement learning on the real-world applications.However,real-world applications are often too complex to offer fully observable envi-ronment information,which means the existence of confounders.Though the existing reinforcement learning methods could tackle such problems,they ignored the hidden confounding bias behind the observed data,consequently making it difficult to reach the optimal performance.This thesis proposes the framework of environment recon-struction for reinforcement learning based recommendation,then proposes the decon-founded environment reconstruction algorithm by treating the confounder as a policy.The main contribution is summarized as follows:1.A framework of environment reconstruction for reinforcement learning based recommendation is defined.Based on the reconstructed environment,the reinforcement learning process could be more efficient and the physical interaction cost is zero.Hope-fully,it can accelerate the landing of reinforcement learning on the task of sequential recommendation.2.A novel method,deconfounded multi-agent environment reconstruction(DE-MER),is proposed to tackle the practical situation where hidden confounders exist in the environment.The unobservable confounder is treated as a confounding agent.Un-der the multi-agent imitation learning framework,two techniques:the coufounder em-bedded policy and the compatible discriminator,are proposed to learn the confounding policy.Experiment results show that DEMER can effectively reconstruct the hidden confounder besides the observable agents.3.An application of the proposed DEMER to the large-scale riding-hailing plat-form of Didi Chuxing is developed.A virtual environment for driver program recom-mendation is reconstructed,in which the recommendation policy could be optimized by reinforcement learning.Experiment results show that DEMER can not only simulate closely to the real world,but also produce a recommendation policy with a significantly improved performance.
Keywords/Search Tags:reinforcement learning, environment reconstruction, multi-agent, hidden confounders, imitation learning, recommendation system
PDF Full Text Request
Related items