Font Size: a A A

Design Of Embedding-based Recall Algorithm In Large-scale Recommendation Scenarios

Posted on:2021-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhouFull Text:PDF
GTID:2518306575953709Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Owing to the accelerated pace of mobile Internet and information technology,the problem of information overload is becoming more and more serious.Many Internet companies' products have launched a personalized recommendation system to help users obtain content they are interested in efficiently,thereby increasing user stickiness and improving user conversion Rate,finally achieve the the company's ultimate commercialization goal for continuous growth.The recommendation system currently used in the industry mainly includes two stages: recall and ranking.The recall stage is responsible for selecting items that are of potential interest to users from a large number of candidates as input to the ranking stage.Optimizing the recall service can provide better candidates for the raning stage,thereby improving user experience.To this end,our paper introduces several embedding-based recall methods commonly used in the industry,and conducts the following research work for large-scale recommendation scenarios:First,an embedded feature selection method based on FFM is designed to filter the original feature set,reduce the size of the model without loss of the AUC of the Recall,and save storage resources.Then,in view of the problem of low Recall when training the recall model with negative samples showing unclicked,a negative sample mining scheme closer to the distribution of the article candidate set was designed,including random negative samples,showing unclicked negative samples and hard negative samples.Then,an improved dual-tower DNN model and an end-to-end feature selection model are proposed.The former is based on the traditional two-tower recall model by adding an FFM feature cross layer to further cross the user vector,item vector and context vector.The latter embeds the jump connection structure of Res Net and the multi-channel hybrid structure of SENet into the calculation of the user subgraph and the article subgraph respectively,allowing the model to filter out important features end-to-end.Then,in view of the problem of "islands" in the practical application of the hierarchical and navigable small world of the embedding-based recall,an optimization scheme of heuristic pruning is proposed,which increases the connectivity of the graph in the index and improves the accuracy of approximate nearest neighbor retrieval.Finally,the paper implemented the above embedding-based recall research content in an information flow recommendation scenario with a DAU over 100 million,achieves a consistent increases in offline indicators and online A/B tests,which improves the user experience in information flow recommendation,and brings more commercial income for company.
Keywords/Search Tags:Information flow, Recommended system, Embedding-base recall, Approximate nearest neighbor search, Hierarchcal navigable small world graphs
PDF Full Text Request
Related items