Font Size: a A A

Research On Online Recommendation Method Based On Multi-behavior Implicit Feedback

Posted on:2022-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:J L XianFull Text:PDF
GTID:2518306497972569Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recommender system is a common technology used to alleviate information overload.A personalized recommender system relies on user behavior feedback data,including explicit feedback and implicit feedback.Implicit feedback such as clicks and favorites has been widely studied and used in recommender system due to its low collection cost,large amount of data and richer hidden information.However,the difficulty of its application lies in the interpretation of user behavior because the explanation of implicit feedback is highly dependent on each application area.In this paper,the online recommendation based on multi-behavior implicit feedback in the e-commerce field is formalized into a multi-armed bandit problem,and an online recommendation model based on multi-armed bandit is proposed.The model mainly contains three modules:environment,action(arm)set and bandit algorithm.Continuously optimize strategy through the interaction between bandit algorithm and environment to maximize cumulative reward and provide users with online recommendation services.Bandit algorithm,as the core module of the model,is the Thompson sampling algorithm based on multi-behavior implicit feedback(MIF-TS)proposed in this paper.This algorithm mainly performs random sampling based on the expected reward distribution of each arm,and selects the best arm to generate recommendations.We verify the effectiveness of the proposed model and algorithm on three public datasets,and discuss the factors affecting the model and propose a differentiated recommendation strategy for the model in a cold start environment.Experimental results show that the proposed model and algorithm can effectively use the implicit feedback of users' multiple behaviors to obtain user preferences,and solve the problem of exploration/exploitation trade-off in recommendation.In addition,a pre-training operation is used to further optimize the recommendation effect so that the model is robust in the cold start environment.The main work and innovations of this paper are summarized as follows:(1)An online recommendation model based on the multi-armed bandit is proposed,which has two unique characteristics.First,we use product categories based on user behavior and product attributes instead of a single product as an arm.This approach not only avoids the complexity of large-scale arms,but also makes full use of contextual information to obtain user preferences,making recommendations interpretable.Secondly,different from the common Bernoulli reward setting,we divide user's multi-behavior feedback into strong interaction behavior and weak interaction behavior,and give them rewards with different weights to update the expected reward distribution of the corresponding arm.In addition,different arm recommendation strategies combining current and historical preferences of users are also proposed to be suitable for different scenarios.(2)We propose a Thompson sampling algorithm based on the multi-behavior implicit feedback called MIF-TS.The algorithm assumes that the expected reward of each arm obeys an independent Beta distribution,and uses multi-behavior implicit feedback to update the posterior distribution,so that the expected reward distribution of each arm gradually approaches true average reward.Random sampling is performed from the posterior distribution,and the arm with the largest sampling value is selected for recommendation,which effectively balances exploration and exploitation,and ensures the accuracy and diversity of recommendation.(3)We conduct experiments on three public datasets with different characteristics,evaluate the performance of the proposed model and algorithm,discuss and analyze several important factors in the model in depth,and discuss the ability of our model to deal with the cold start problem.
Keywords/Search Tags:multi-behavior implicit feedback, recommender systems, multi-armed bandit, Thompson sampling, exploration and exploitation
PDF Full Text Request
Related items