Font Size: a A A

Research On Recommendation Algorithm Based On Deep Rein-forcement Learning

Posted on:2022-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z W DuFull Text:PDF
GTID:2518306551470764Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning based recommendation models have received much attention due to their flexible recommendation strategies and the ability to consider users' long-term interactive experience in the future.Although there are many researches related to Deep reinforcement learning based recommendation models,the existing researches still face the following two challenges: The first challenge is that the existing deep reinforcement learning recommendation models do not consider the user's temporary preference when learning user preference.In reality there always exist atypical behaviors that seldom appear before and obviously deviate from the general preference revealed by users' usual behaviors.Due to the dynamic nature of user preference,even so called atypical interactions do not have to mean nothing but can also reveal users' temporary interest.Existing works have the following problems in capturing users' temporary preference:First,the existing works usually consider the atypical interaction as noise,and remove the noise when learning users' preference;Second,there are no explicit supervision signals to indicate atypical interactions;Third,the user's atypical interactions are likely to be overwhelmed by the user's normal interactions.The second challenge is that the existing deep reinforcement learning recommendation models do not take into account the dynamic long-term preference of users when capturing users' sequential preference.Existing works have the following problems in capturing the user's long-term preference: First,existing works usually use the user's static information to learn the user's long-term preference,without considering that the user's longterm preference will change over time.Second,the existing works do not distinguish the contributions of different long-term and short-term preference when using the user's long-term and short-term preference to learn the user's sequential preference.However,the contribution of the user's long-term and short-term preference is different when the user interacts with the item.For the problem of deep reinforcement learning recommendation that captures users' temporary preference,this thesis proposes a novel multi-agent temporary interest aware deep reinforcement learning based recommendation model(TIARec).The main idea of TIARec is to use an auxiliary classifier agent to use the trial and error method of reinforcement learning to help the recommender agent identify the user's atypical interactions and use attention network to learn the user's temporary preference from the identified atypical interactions.This thesis conducts a large number of experiments on real datasets to verify the recommendation performance of the TIARec and the ability of the TIARec to capture the users' temporary preference.For the problem of deep reinforcement learning recommendation that captures users' sequential preference,this thesis proposes a value-based deep reinforcement learning recommendation model that integrates users' long-term and short-term preference(LSRM).LSRM uses hierarchical attention network to learn the user's dynamic long-term preference and treat the user's long-term and short-term preference differently when learning sequential preference.This thesis conducts a large number of experiments on real datasets to verify the recommendation performance of the LSRM in comparison with competitive baselines.
Keywords/Search Tags:Deep Reinforcement Learning, Recommendation Algorithm, Temporary Preference, Long-Term and Short-Term Preference
PDF Full Text Request
Related items