| In recent years,the rapid development of Internet technology has promoted the exponential growth of information,thus bringing us into the era of information overload.At present,the recommendation algorithm is one of the effective methods to solve this problem,among which the most successful application is the collaborative filtering recommendation algorithm.The collaborative filtering recommendation algorithm looks for the nearest neighbor set with the target user or item according to the historical track of the user subscribing to or browsing the item,and then forecasts the score of the target object according to the userundefineds score of the item in the nearest neighbor set.Finally,the top items will be recommended to users.This algorithm has a high degree of recognition both in academia and in industry,but at the same time,it also has some problems,such as cold start,data sparsity and expansibility is not strong and so on.The emergence of cloud computing technology has attracted the interest of the majority of scientific researchers and provided new solutions to these problems.This paper combines cloud computing Hadoop technology and clustering technology,and studies the collaborative filtering recommendation algorithm in the new period.The main work and innovations of this paper include the following aspects:1.Aiming at the traditional K-means algorithm and Canopy algorithm,the problem of random selection of initial center is analyzed.In this paper,a MVC-Kmeans(K-means based on the Minimum Variance Canopy)algorithm is proposed to obtain the Canopy optimal global center as the initial value of the K-means clustering center by using the minimum variance,and the implementation process of the algorithm is introduced in detail.The parallel MVC-Kmeans algorithm is verified on the standard UCI dataset.The results show that compared with the traditional K-means clustering algorithm,this method can get better clustering quality and faster convergence speed,and is suitable for clustering analysis of large-scale data.2.For the recommendation system,the collaborative filtering recommendation algorithm is analyzed emphatically.for the data sparsity and expansibility of the algorithm,In this paper,a collaborative filtering recommendation algorithm based on MVC-Kmeans clustering is proposed and the principle and implementation of each stage are described in detail.The main idea of this method is as follows: firstly,ALS(alternating least squares)matrix decomposition technique is introduced to preprocess the sparse scoring matrix,and then the item clustering model is constructed by combining MVC-Kmeans clustering technology to construct the filled scoring matrix.Finally,the project-based collaborative filtering recommendation is completed on the candidate set determined by the clustering model.At the same time,the collaborative filtering algorithm based on MVC-Kmeans clustering is analyzed from the setting of parameters to the recommended quality,data sparsity and acceleration ratio on the film dataset MovieLens.The results show that the proposed method has good robustness to different sparse data sets,and can obtain better recommendation quality than other collaborative filtering recommendation algorithms,and has a good acceleration ratio on different sizes of data sets. |