Font Size: a A A

Research On In-batch Negative Sampling And Utilization Of Recommendation Systems

Posted on:2024-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:2568306929490334Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet industry and the development of mobile communication technology,the problem of information overload has become increasingly severe.Recommender systems serve as important tools to address this issue.Since implicit feedback data only contains positive samples which reflect user interest,and negative samples are required to train recommendation models by machine learning algorithms,negative sampling techniques are commonly used to construct negative samples.In negative sampling techniques,in-batch sampling does not require the reloading and encoding of items and features outside the mini-batch,making it more practical in large-scale industrial recommendation scenarios.However,the in-batch sampling algorithm inherently has several limitations:(1)When performing in-batch sampling,the sampling range and sampling distribution are constrained due to the exposure bias of the items sampled within the mini-batch and the fact that mini-batch frequently contain fewer data.(2)Existing research has not explicitly distinguished and utilized the difficulty levels of negative samples,and the uncertainty of negative sample difficulty has also been disregarded in previous studies.By leveraging difficult negative samples,the sampling quality of the in-batch sampling strategy can be improved.This thesis focuses on the exposure bias in in-batch sampling and the utilization of difficult negatives.It investigates these problems under the scenarios of modeling recommender system models as binary and multi-class classification.Based on these issues,this thesis proposes a series of works by incorporating importance sampling to weight negative samples on top of in-batch sampling and introducing a caching mechanism.The proposed methods,NWIS and DBRS,are elaborated as follows:(1)Negative Sample Weighting Strategy Based on Importance SamplingTo address the aforementioned issues in in-batch sampling for binary classification,this thesis proposes a strategy called NWIS,which corrects exposure bias and enhances the utilization of difficult negative samples.By treating the popularity distribution in exposure bias as the proposal distribution and the probability distribution of difficult negative samples as the exact distribution,importance sampling is employed to weight the negative sample term in the binary classification loss function.This approach increases the weight of difficult negative samples in the loss function,thereby guiding the optimization direction and improving the recommendation performance of the model.Furthermore,NWIS is extended from in-batch sampling to global popularity sampling,and its effectiveness and training efficiency are validated on four different publicly available datasets under two sampling modes.(2)Cache-Enhanced In-batch Sampling with Difficulty-Based Replacement StrategyTo address the aforementioned problems in multi-class scenarios,this thesis proposes a definition of training difficulty for negative samples,which measures the degree to which the model assigns higher scores to negative samples compared to positive samples during training.By explicitly measuring and utilizing difficult negative samples based on this definition,the recommendation performance of the model can be further improved.Based on this definition,this thesis presents the DBRS strategy,which enhances in-batch sampling with caching based on training difficulty.The strategy adaptively and heuristically updates the cache by calculating training difficulty,estimating the mean and variance for each item,and performing negative sampling from both the mini-batch and the cache.Specifically,samples with higher training difficulty and uncertainty have a higher probability of being restored.This allows the model to adaptively explore and utilize negative samples during the training phase according to this strategy,effectively enhancing the information brought by negative samples and enabling better convergence.The effectiveness of the DBRS algorithm is validated on four different publicly available datasets in various scenarios,improving the recommendation performance of in-batch sampling models.
Keywords/Search Tags:Negative Sampling, Recommendation system, Importance Sampling, Information retrieval, Deep learning
PDF Full Text Request
Related items