Font Size: a A A

Research And Implementation Of Multi-scene Retrieval Method Based On Vectors

Posted on:2022-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:X F TianFull Text:PDF
GTID:2518306563964489Subject:Software engineering
Abstract/Summary:PDF Full Text Request
This paper mainly comes from my internship project in Xiaohongshu company.Based on the user's behavior sequence,user portrait,note portrait and context,the project estimates the user's interested notes and completes the recall(a part of the recommendation system which aims to roughly select some items to be recommended for users).Due to the consideration of commercialization,the platform will embed feed ads in the notes,so the recalled notes should not only meet the preferences of users,but also maximize the interests of the platform.The main work of this paper is in the recall stage,mainly the research work of vectorized recall.Due to the huge candidate set and high real-time requirement in the recall stage,the current recall method generally adopts the multi-channel recall method,and each channel is used for a specific recall purpose.The recall method based on strategy(such as recall according to heat and geographical location)and statistical(such as collaborative filtering and matrix decomposition)has the advantages of easy deployment and strong interpretability,but the disadvantages are that the usage scenarios are limited and it is difficult to meet the personalized needs of users;However,the existing graph representation learning methods are greatly influenced by popular items,which will lead to serious Matthew effect.In order to solve the above problems,this paper proposes a multi-scene recall method based on vectorization.This method includes two kinds of scenarios: Notes recall for specific users(U2I)and notes recall between similar notes(I2I).Firstly,a variety of data sources are analyzed and processed,and the features are screened according to the importances.Different algorithms are used in the two scenarios.In U2 I scenario,the twotower DSSM model is used to optimize the interaction between users and notes.In the input layer,the numerical features are divided into buckets according to the distribution,and then all features are vectorized by embedding,which can effectively reduce the model parameters and accelerate the model convergence;At the same time,the cross layer is added into the model to improve the efficiency of feature extraction.In I2 I scenario,this paper uses graph attention network to learn the adjacency relationship between notes,and updates the node representation through the aggregation of multiple attention layers,so that notes can integrate global features' information.After all vectors are exported,the notes will be retrieved and recalled.This paper sample user-notes interactions behavior in the Xiaohongshu for 7 days,so as to get the training set and evaluation set respectively.This paper uses two metrics in the evaluation,offline training metrics including AUC(Area Under the Curve)and Accuracy in the training and testing stage,and recall values named Recall@K(an evaluation on the top K items in the predictions and the truth items sequence).In the experiment,the AUC is 0.7619 and the accuracy is 0.763.Compared with other existing methods,the method proposed in this paper has lower prediction error and better recall result,which proves the effectiveness of this paper.
Keywords/Search Tags:recommendation system, behavior modeling, recall, graph neural network, vector matching
PDF Full Text Request
Related items