Font Size: a A A

Performance Improvement Of K-means Algorithm And Its Application In Movie Recommender System

Posted on:2018-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y N WangFull Text:PDF
GTID:2348330533466283Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the flourishing development and popularization of Internet technology, massive data is produced. Clustering analysis of the massive data could bring great commercial value.Therefore, K-means algorithm is widely studied and applied. As the data to be clustering mined generally the characteristics of massive and sparse, the traditional K-means algorithm is prone to memory overflow because of its running mechanism and calculation strategy. To address the efficiency drawback of K-means algorithm, parallel sampling K-means algorithm was proposed, but it suffers from unstable clustering effect and too many iterations.The research work of this paper aims at performance improvement of parallel sampling K-means algorithm and the its application in practical recommendation system. The detailed research work includes:First, this paper proposes an improved parallel sampling algorithm IPSK (Improved Parallel Sampling K-means). The algorithm extracts multiple samples in parallel from the overall data set; the initial clustering center was calculated for each sample; pick out optimal sample initial clustering centers to form a cluster center matrix. Clusters each element in the cluster center matrix, takes the clustering results as the initial clustering center of the overall data set. Experimental results show that with this calculation method of sample initial clustering centers, this algorithm makes the initial clustering center sample more representative, weakens the sensitivity to the initial cluster center, and at the same time it is accurate and stable for large data clustering;Second, the IPSK algorithm is introduced into the user based collaborative filtering recommendation algorithm, and a user clustering collaborative filtering recommendation algorithm (IPSK-UCF) based on IPSK is designed;Finally, a movie recommendation system is designed and implemented, and the application of IPSK-UCF algorithm in practical recommendation systems is studied. The system can discover the user's interests and preferences based on his/her rating of the movie and historical browsing record, accordingly it recommends movies for the user. The design and implementation of the system are described in details, and the effect of the system is also demonstrated.
Keywords/Search Tags:Data Mining, K-means Algorithm, Hadoop, Collaborative Filtering Recommendation
PDF Full Text Request
Related items