Font Size: a A A

Research On Dynamic Recommendation Parallelization Algorithm Based On Clustering

Posted on:2018-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:2358330515451393Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,the amount of data is dramatically increasing with the popularity of Internet and mobile devices,and vast information has led to serious information overload.The problems that how to analyze the user's interest quickly in abundant information and recommend the interested message to the users become a hot issue in the current research.As one of effective ways to solve this problem,collaborative filtering recommendation algorithm can realize personalized recommendation by establishing model about preference information and historical data of users.However,with the increase of the data size,the data sparsity,real-time,accuracy and other issues are more and more fearful,which leads to a significant decrease in the recommendation's quality of the Slope one algorithm.Focused on the problem of low accuracy,high computational complexity and slow running speed,following work has done in this thesis:(1)This thesis analyzed and summarized concepts and flows about collaborative filtering recommendation algorithm,measures of similarity and clustering algorithm.In addition,the architecture,workflow and building process about Hadoop platform and Spark framework are introduced.(2)The SBTICK-means parallel clustering algorithm based on Spark framework is proposed.Firstly,it is preprocessed by Canopy.And then during K-means iterative calculation,redundant computation is reduced and clustering speed is accelerated by the triangle inequality theorem.Experimental results show that the proposed algorithm improve clustering efficiency while ensuring the accuracy rate,and the size-up rate,scale-up rate and operating speed are also increased.(3)The weighted Slope One algorithm based on clustering and Spark framework is put forward.Firstly,the traditional rating similarity is included into the time weight,and dynamic change of user's interest over time is reflected.Secondly,comprehensive similarity is computed by introducing item attribute.And the set of nearest neighbor is generated through using the presupposed SBTICK-means algorithm.Finally,combining with the time decay function,the rating prediction and recommendation are realized.Experimental results show that the improved algorithm is more accurate than the traditional Slope One algorithm and Slope One based on user similarity,which can improve the running efficiency compared with the Hadoop platform.In summary,this thesis begins with both the basic idea and deficiency of Slope One algorithm.And its accuracy of predicted rating,the real-time performance and scalability are optimized.Eventually,the whole parallelization is realized by combining with the Spark framework.The work in this thesis significantly enhances the accuracy and efficiency of clustering and recommendation.Moreover,it has some research value and practical significance for further studying on that of massive data.
Keywords/Search Tags:Slope One, Clustering, Spark, Time Weight, Item Attribute
PDF Full Text Request
Related items