Font Size: a A A

Study On The Sparsity Problem Of Collaborative Filtering Algorithm

Posted on:2014-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:J L DaiFull Text:PDF
GTID:2268330392971423Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the information age, the coverage and popularity of the Internet is gettinghigher and higher,which brings e-commerce into peoples’s lives and makes thenetwork data grow explosively.But on the other side people will spend more time tosearch what they really need, which is called information overload.It is very importantfor e-ecommerce platforms to solve the problem. Fortunately the appearance ofrecommendation system fill the blank timely,especially the Collaborative Filteringalgorithm called CF achieves huge success. But with the sharp increase of users anditems,the percentage of rated items will decrease obviously as a result the user-item matrix will get sparse. Unfortunately it makes the final accuracy outcome oftraditional CF algorithm decline.The sparsity problem has become the bottleneck of RS(Recommendation System)based on CF. Aiming at this trouble, this paper takes three measures including theconditional probability algorithm,choosing neighbors flexibly and filling the unrateditems in user-item matrix by two steps that can be called two step filling.Firstly, the paper proposes an algorithm based on conditional probability whichcan dig more neighbors that traditional collaborative filter algorithm can’t find.And thenmake the neighbors predict the score to the target item.At last the final score isgenerated after synthetically combining this algorithm with traditional CF algorithm.Secondly, the paper sets fixed thresholds for the parameters of the user similarityand quantity of items rated by target user and its neighbors, then it chooses the users upto the parameters’ thresholds as the target user’s neighbors instead of choosing fixed kneighbors for the target user,which ensures the neighbors more credible.Thirdly, it divides predicting process into two steps. The first step is to setstrictthresholds for the parameters of the user similarity and quantity of items rated by targetuser and its neighbors. Then it chooses the users up to the thresholds as the targetuser’s neighbors and fills parts of the unrated items. The second step,The user-itemmatrix will get more dense after filling parts of the matrix in first step. Based on the firststep parts of users not up to thresholds will change into being standard and turn into thetarget user’s neighbors. In return, the target user will have more neighbors to choose.The second step will cut down the limits for making all the unrated items be filledwithout surplus. Finally, the experiments is realized on MovieLens dataset and Eclipse. And theresults clearly indicate that the strategy of choosing neighbors flexibly is more crediblethan kNN strategy with lower MAE. Conditional probability algorithm can findpotential neighbors and alleviate the data sparsity problem. The two step filling strategycan also obviously alleviate the data sparsity problem and improve predicting accuracy.
Keywords/Search Tags:recommendation system, conditional probability, collaborative filtering, data sparsity, two step filling
PDF Full Text Request
Related items