| Reinforcement learning(RL)-based recommender system can be modeled as a sequential decision-making problem between the system and the user.It can generate high-return recommendations by considering long-term cumulative rewards,but the high cost of online learning makes it impossible to actually deploy.Learning an efficient and robust RL agent on sequences of users’ historical interactions(offline RL)is an appealing approach.However,existing offline RL-based methods have challenges in the following two aspects:1)The distribution shift between training data and test data makes it impossible for offline RL agent to accurately estimate the value of states not included in the training data;2)the absence of negative feedback data in training data makes it difficult for recommender systems to learn an effective state representation.Aiming at the above two challenging problems,this thesis proposes a contrastive state augmentation(CSA)framework for the training of RL-based recommender systems.To tackle the first issue,this thesis proposes four state augmentation strategies to enlarge the state space of the offline data,so that offline RL agent can access local regions of the original state and furtherly improve the generalization ability of recommender systems by ensuring that the learned value function remains similar between the original state and the augmented states.For the second issue,this thesis proposes introducing negative feedback information by randomly sampling the negative state from other sessions and designing a contrastive loss between the augmented state and negative state to further improve the performance of the recommender system.To verify the effectiveness of the proposed CSA,this thesis conducts extensive experiments on two publicly accessible datasets and one dataset collected from a real ecommerce platform.The generality of the CSA is verified by combining CSA with three stateof-the-art recommender methods.This thesis also conducts experiments on a simulated environment as the online evaluation setting.The experimental results in the offline environment and simulated online environment both show that CSA can effectively improve recommendation performance. |