| How to quickly and effectively excavate the information or goods that users are interested in from massive data is the content of the research in the field of recommendation system.With the arrival of the era of big data,this demand is more intense.Although the recommendation system has made some achievements in theory and application,the traditional recommendation algorithms still take a lot of time to deal with big data,and can not meet the real-time requirements of online recommendation.The emergence of the Spark memory computing platform can provide technical support for improving the efficiency and real-time performance of the recommendation algorithms.The purpose of this paper is to optimize and parallelization implementation of recommendation algorithm based on Spark platform,and use stream computing framework to achieve a recommendation system that can satisfy the recommendation offline and online.Based on the Spark platform and related big data technology,the research contents of this paper are as follows:(1)The optimization and parallelization of collaborative filtering algorithm.On the one hand,in view of the low efficiency of the user based recommendation method in iterative computing,the algorithm based on clustering on the Spark is used to speed up the operation efficiency.On the other hand,in view of the defect that the ALS method ignores the similarity information,a KNN-ALS model combined with KNN is proposed.Furthermore,a modified similarity measure is used to improve the dimension difference of user similarity.(2)Implementation of recommendation system based on Spark platform.On the basis of big data technology,a recommendation system both offline and online is built on the Spark.The design of offline recommendation focus on data warehouse and recommendation engine.As for online recommendation,system are based on the kafka and Spark Streaming.In off-line recommendation,the experimental results on the film dataset show that the improved parallel recommendation algorithms have been passed on the tests of related evaluation metrics.In addition,the data warehouse of system not only has a great advantage in reading and writing performance compared to traditional storage mode,but also has the significant storage savings ability.About the online recommendation,loading of stream data and update of model in dynamic data environment are completed by the way of combination of kafka and spark streaming. |