Font Size: a A A

Research And Implementation Of Recommendation System Based On Spark Machine Learning

Posted on:2020-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2428330596978695Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the coming of mass data era,how to get useful information from mass data is an important problem,that also is information overload.The recommendation system has become one of the most effective means of solving information overload.Meanwhile,as the amount of data increases,the recommendation system also faces many problems and challenges.Such as data sparsity and massive data processing capabilities,these questions lead to how to find the user's interest preferences from the data of existing users.In real life scenes,few users directly score the viewed items,which leads to the sparseness of the data,and the explicit item rating does not fully express the user's preference for the items.Compared with the user's explicit data,its implicit data has greater research value,such as browsing records,time and so on.Through the study of implicit data,users can more accurately find items of interest.How to quickly calculate the list of items that users are interested in from the massive data is a great challenge for the recommendation system.With the popularity of machine learning,using the build recommendation model can quickly process massive amounts of user data and calculate a list of items that the user is interested in.Facing these problems of the existing recommendation system,this paper designs a recommendation system based on Spark machine learning.Its main work is as follows:Firstly,through the analysis of model-based collaborative filtering algorithm,the recommendation problem is transformed into the classification and prediction problem of machine learning.First we build the user data into a user-item matrix,then use the dimensionality reduction of singular value decomposition algorithm and the loss function optimization of alternating least squares algorithm.Finally,the optimal design of implicit feedback based on alternating least squares algorithm is proposed,which solves the problem of user data sparsity.Secondly,in order to solve the problem of massive data processing,this paper uses the alternating least squares algorithm based on implicit feedback to train the user's historical implicit data set to a recommendation model,and includes preprocessing of data and performance improvement of the model.Finally,the model is deployed on the Spark big data processing platform and the recommendation result is calculated for a certain user.At the same time,a combined performance monitoring platform based on Spark Web UI and Ganglia is built to monitor the data processing process of Spark big data platform,which provides a reliable basis for the allocation and adjustment of cluster resources.The experimental results show that with the increase of the number of Spark cluster nodes,the data processing acceleration ratio of large data sets increases rapidly,which shows that Spark cluster can have relatively high data processing performance in the face of massive data processing.In the case of the same model parameters,the RMSE value of the alternating least squares implicit feedback recommendation model under different iteration times is generally lower than that of the alternating least squares explicit feedback recommendation model,indicating that implicit user data is passed through the alternating least squares.The recommended model trained by the algorithm can more accurately find out the list of items that the user prefers.
Keywords/Search Tags:Recommendation System, Implicit Feedback, Alternating Least Squares, Spark Big Data Processing Platform
PDF Full Text Request
Related items