| Higher living standards and the development of information technology led to the explosive growth of information carried by the Internet,which caused the previous way(searching)not to meet people’s information needs.The recommender system has become the new internet portal by pushing information to users initiatively,greatly reducing people’s information access costs.Nowadays,most applications use a recommender system as one of the main ways to display information.The most critical part of a recommender system is the personalized recommendation algorithm,which learns the user’s past behavior and extracts the user’s interest from massive amounts of information so that the recommender system can display customized content to users.As the scenarios of recommender systems become more and more extensive,the requirements for recommendation algorithms are also increasing.The recommendation algorithm should learn the user’s historical behavior accurately and grasp the user’s dynamic interests to ensure the diversity of recommendations to maximize the user’s long-term experience,which is exactly the advantage of reinforcement learning.This thesis focuses on the application of reinforcement learning algorithms to recommender systems.First,this thesis gives the MDP(Markov Decision Process)modeling for recommendation scenarios,which transforms the recommendation scenarios into the reinforcement learning paradigm.Then,this thesis proposes a curiosity-driven reinforcement learning recommendation algorithm based on the Actor-Critic model.An intrinsic curiosity module is introduced for the training process that can reward the algorithm for exploring new states,thus enhancing the diversity and novelty of the recommendation algorithm.In addition,this thesis investigates the problem of state representation in reinforcement learning.Most studies focus on the reinforcement learning algorithm and ignore the impact of the state representation of environments.While state encoding determines how the agent observes the environment,thus state representation has a significant impact on the performance.This thesis proposes five different state representation networks based on LSTM networks and the Attention mechanism and conducted substantial experiments to see how they affect the performance of reinforcement learning algorithms.Finally,this thesis implements a personalized recommender system based on reinforcement learning,which provides personalized movie recommendations for users,who can browse the information of movies and TV series on our system.Users can also comment and rate the movies.The system is divided into two subsystems: the app is for common users,and the dashboard system provides management functions for administrators. |