Font Size: a A A

Implementation And Application Of Fuzzy Clustering Algorithm Based On Spark

Posted on:2020-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WuFull Text:PDF
GTID:2428330590995691Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,the data scale shows an exponential growth trend.How to quickly and accurately extract the most concise and valuable information from mass data by means of data mining technology has become a research hotspot.At the same time,how to use data mining technology to improve the recommendation accuracy of recommendation system also has become an important issue.This thesis focuses on the research of fuzzy clustering algorithm and its application in collaborative filtering recommendation.Firstly,aiming at improving the efficiency of the fuzzy clustering algorithm,the parallelization scheme of the fuzzy clustering algorithm FCM is designed and implemented based on the characteristics of large data processing platform Spark.The scheme uses HDFS to distributed store the underlying data,uses RDD Mechanism to convert the data in the calculation process,and adopts persistence technology to reuse the intermediate results.The experimental results of clustering KDD CUP 99 data set validate that the parallel FCM algorithm based on Spark platform has better clustering accuracy and timeliness.Secondly,a UserCF algorithm fusing FCM algorithm is designed,named FCM-UserCF algorithm,which merges the parallel FCM with the user-based collaborative filtering recommendation algorithm UserCF.FCM-UserCF algorithm uses FCM algorithm to cluster all users into several local user groups,so that the calculation of searching the nearest neighbor by UserCF algorithm changes from global to local.It fills the user-item score matrix with Slope One algorithm to overcome data sparsity and improve real-time computing,and uses UserCF algorithm to calculate user similarity and nearest neighbor,generate prediction score and do TOP-N recommendation.The experimental results of doing recommendation with MovieLens data set verify that FCM-UserCF algorithm can effectively solve the problem of data sparsity and improve the accuracy of recommendation.Finally,a simple prototype system of e-mall recommendation is developed,and the FCM-UserCF algorithm is applied in the recommendation module of the system.The results of experiments and application have shown the validity and practical value of the work done in this thesis.
Keywords/Search Tags:Fuzzy clustering, Spark, Collaborative filtering, Recommendation system
PDF Full Text Request
Related items