Font Size: a A A

Research On Distributed Collaborative Filtering Recommendation Algorithm Based On Fast Matrix Factorization

Posted on:2021-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2518306122968609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,data on the Internet is accumulating at an unprecedented rate.Massive data will cause serious information overload problems.Personalized recommendation is one of the effective ways to solve this problem.In big data scenarios,the recommendation system usually needs to process large-scale high-dimensional sparse data.Among various recommendation algorithms,collaborative filtering algorithm based on matrix factorization has high accuracy and good scalability in dealing with this kind of data,so it has been widely studied and used.However,when the matrix factorization technique is applied to large-scale implicit feedback data,the following three problems will occur: First,the implicit feedback recommendation inherently lacks negative feedback information,and the model construction based directly on the existing implicit data cannot effectively reflect user preferences.Secondly,due to the need to obtain negative feedback information from the missing data,and the missing data is usually several orders of magnitude more than the existing data,which will greatly increase the time complexity of the optimization algorithm.Finally,limited single-computer computing resources will greatly limit the processing efficiency of large-scale recommendation data.In order to solve the above problems,this paper deeply studies the distributed collaborative filtering algorithm based on matrix factorization,and puts forward the corresponding improved algorithm.As follows:(1)In order to solve the problems of lack of negative feedback data and low efficiency of model training in implicit feedback recommendation,this paper proposes a user-activity and item-popularity weighted matrix factorization(UIWMF)recommen-dation algorithm.UIWMF algorithm adopts a missing data weight strategy based on user activity and item popularity.Compared with the traditional consistent miss-ing data weight strategy,it can obtain negative feedback information from missing data more effectively and obtain higher recommendation accuracy.In addition,in order to improve the model training efficiency of UIWMF,this paper proposes a fast matrix factorization optimization algorithm based on cyclic coordinate descen-t,which avoids a large number of repeated calculations by skillfully designing the cache matrix,thus effectively improving the training efficiency of the algorithm.(2)In order to break the resource limitation of traditional single computer,this paper proposes an efficient distributed UIWMF(DUIWMF)algorithm based on Spark.By adopting a distributed cache strategy based on in-block and out-block,DUIWMF can effectively avoid the problem of transmitting useless and repetitive feature vec-tors in the traditional broadcast communication strategy,so that the communication overhead can be significantly reduced.In this paper,a comprehensive experiment is carried out on Ali Cloud E-Map Reduce based on three public recommendation data.The experiment is divided into two parts.The first part verifies the effectiveness of the UIWMF algorithm.By comparing with several baseline matrix factorization implicit feedback recommendation algorithms,the advantage of UIWMF model in recommendation accuracy is verified.In the second part,the DUIWMF algorithm is compared with the two baseline distributed recommendation algorithms,and the experimental results verify the efficiency of the DUIWMF algorithm.In addition,a large number of experiments are carried out to verify the scalability of the DUIWMF algorithm.
Keywords/Search Tags:Implicit feedback recommendation, Collaborative filtering, Fast optimization, Distributed matrix factorization, Spark
PDF Full Text Request
Related items