Font Size: a A A

Recommendation Model Research Oriented Sparse Data

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330611964265Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid popularization and development of the Internet has prompted the explosive growth of data volume.Collaborative filtering recommendation technology mines users' interest by analyzing users attributes,items attributes and interaction record between user and project,and proactively provides items that users may be interested in,which plays an important role in alleviating the problem of “information overload”.User rating is one of the data bases of collaborative filtering recommendation technology.There are two general rating systems: single-criteria and multi-criteria.The former only has a single overall rating,while the latter provides user's ratings on various criteria in addition to the overall rating.As the scale of the website grows larger and larger,the sparsity of rating data will become more and more serious,resulting in a sharp decline in the recommendation quality of collaborative filtering recommendation systems based on rating data.In order to alleviate the negative impact of sparse rating data,scholars have launched a series of collaborative filtering algorithms.In view of the sparsity of single-criteria collaborative filtering rating data,many scholars have proposed to introduce auxiliary information,such as social relations,comments,tags information and other auxiliary information to describe users or items more favorably.In multi-criteria collaborative filtering,scholars proposed the method of dimension reduction for the problem of rating data sparsity,but dimension reduction will lose a relatively small number of original ratings,resulting in information waste.At present,the method of linear regression is mostly used for multi-criteria rating aggregation,without considering the complex data mapping relationship between the overall rating and each criteria rating.Most scholars use high rated items as recommendation items based on rating prediction for item recommendation sequence,but users are not always interest in these items with high predictive rating,and some items with higher priority in users' minds will be ignored.Based on the above analysis,starting from the problem of sparse rating data,this paper studies the tasks of rating prediction and item recommendation from the following three aspects:(1)Aiming at the sparsity of single-criteria rating data,a rating prediction model combined with item genres information(RPIG)is proposed.By extracting the user's preference for item genres and the three auxiliary features of user average rating,item average rating and user item genres average rating,we can alleviate the impact of rating data sparsity.The extracted auxiliary information is constructed into a training sample set,and the GBRT regression model is used to fit the training samples to predict the users' rating for a item.Comparative experiments with relevant research results on Movielens 100 K and Movielens 1M datasets show that the proposed RPIG model can effectively improve the accuracy of rating prediction.(2)Aiming at the problem of rating data sparsity and the problem of rating aggregation in multi-criteria,a multi-criteria rating aggregation model based on reliable factors(MCRF)is proposed.Considering that the matrix filling technique does not lose any original information,it can reduce the influence of multi-criteria rating data sparsity on the accuracy of rating prediction.After pre-filleding each criteria rating matrix through the fused user similarity,a reliable factor is introduced to measure the credibility of the pre-filled rating for reducing the impact of the filled rating error on the calculation of user similarity.Two methods are used to aggregate the multi-criteria rating.The one is that we calculate the user's criteria tendency and the consistency of criteria rating for obtaining the user's preference weight for each criteria,and aggregate the multi-criteria rating through the weight.The other one is that we use the GBRT regression model to fit each user's multi-criteria rating to predict the overall rating.Comparative experiments with relevant research results on the three sub-datasets YM-20-20,YM-10-10,YM-5-5 of the Yahoo! Movies dataset show that the proposed MCRF model can effectively improve the accuracy of rating prediction.(3)Aiming at the problem of rating data sparsity and the problem of item ordering,based on the assumption that distance reflects preference,a multi-criteria recommendation model based on metric learning(MCML)is proposed.Metric learning technology regards distance as user's preference for item.We learn the distance between user and item in the metric space and flip the distance to get the predicted rating,so as to fill the rating matrix of each criteria to alleviate the impact of the sparsity of the multi-criteria rating data.In the metric space of filled rating and overall rating,we learn multiple groups of distances from user to item,and comprehensively sort all distances to provide users with Top-N recommendation lists.Comparative experiments with relevant research results on the sub-datasets YM-20-20 of the Yahoo! Movies dataset show that the proposed MCML model can effectively improve the accuracy of item recommendation.
Keywords/Search Tags:sparsity, item genres preference, multi-criteria collaborative filtering, matrix filling, metric learning
PDF Full Text Request
Related items