| Among many recommendation technologies,the collaborative filtering recommendation algorithm has become the core algorithm of many recommendation systems with its good recommendation effect and recommendation performance.However,the traditional collaborative filtering algorithm only uses the user rating data for analysis and calculation,and the recommendation accuracy is low.At the same time,most of the existing researches are mainly based on single-node calculations,in the face of massive data,the scalability requirements cannot be met.Therefore,for the problem of low accuracy and poor scalability of the proposed algorithm,this study improves the user-based collaborative filtering algorithm,and parallelizes the design and implementation of the improved algorithm based on the Spark platform.The main tasks are as follows:1.A collaborative filtering algorithm for improving user clustering is proposed.The traditional collaborative filtering recommendation algorithm only uses the scoring data for analysis and calculation,resulting in a low recommendation accuracy.The user's item genre attention model is constructed based on the user's access frequency to each item genre,and improved fuzzy C-means clustering from the initial class center selection and distance metrics,and then made a collaborative filtering recommendation within target user clusters.Experimental results show that when the number of neighbors is 40 and the number of clusters is 25,the average absolute error of this algorithm is 3.41% lower than that of the traditional collaborative filtering algorithm.2.A collaborative filtering algorithm that fuses user interest factors is proposed.Considering that data sparsity and user interest will change with time,data preprocessing process of matrix filling and score correction is introduced.The collaborative filtering algorithm for improving user clustering does not fully consider the user's interest in the similarity calculation process.A user-genre genre difference model which based on the user's rating of the item genre is constructed to improve traditional similarity calculation.Finally,made a collaborative filtering recommendation based on the improved user clustering.The experimental results show that under the same clustering condition,when the number of neighbors is 40,the average absolute error of the algorithm is lower by 2.85% than that of the collaborative filtering algorithm based on improved user clustering,indicating that the algorithm can further improve the recommendation accuracy.3.A parallel implementation scheme of improved user-based collaborative filtering algorithm was designed.For the scalability problem of the recommendation system,this study divides the collaborative filtering algorithm into three stages based on user clustering and user's interest: matrix slicing and scoring correction padding,user clustering,and collaborative filtering recommendation according to the algorithm processing flow,and for each phase of the task of the algorithm for parallel design and implementation.The experimental results show that under the Movie Lens-1m data set,the running time of the algorithm on the 4-node Spark cluster is reduced by 50.16% compared with the single node,which proves that the algorithm parallelization scheme based on the Spark platform can effectively solve the problem of system scalability. |