Font Size: a A A

Research And Application Of Distributed Recommendation Algorithm Based On Fuzzy Clustering

Posted on:2019-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:X W RenFull Text:PDF
GTID:2428330548481383Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet 3.0,information interaction service between the user terminals and the entire network world has been realized.Faced with such a large-scale industry growth,information overload has become more and more serious.Under such a background,the usage of personalized recommendations to filter information and automatically recommendation can solve the problem of information overload,and also improve the interaction between users and products.These studies have far-reaching application prospects and practical significance.Among multiple recommendation algorithms,the most widely used algorithms is the Collaborative Filtering(CF)recommendation algorithm,but the CF recommendation algorithm also has some problems,such as sparsity,scalability,and cold start.In the methods of improving the CF recommendation algorithm,the clustering algorithm is often used to cluster the scoring matrix and classify the user or project into different categories so as to search the neighbor set more efficiently and accurately,thereby it can improve the recommendation quality;Hadoop,an efficient distributed computing platform,can be used as a computing platform for distributed recommendation algorithms.This paper start from the clustering algorithm,firstly,aiming at the problem of dividing the density peak clustering algorithm by Euclidean distance measure,we propose a shared neighbor distance measure to improve it.In order to solve the problem that the fuzzy c-means algorithm(FCM)'s clustering result depends on the initial parameter setting,this paper optimizes the strategy of selecting the center point by DPC algorithm to optimize it and proposes a fuzzy C-means clustering algorithm based on density peak(DPCNDFCM),Then we propose a DPCND-FCM algorithm based on MapReduce(MrDPNS-FCM).At last,in view of the data sparsity and extensibility of CF recommendation algorithm,this paper uses MrDPNS-FCM algorithm to improve the CF algorithm,and implements a distributed collaborative filtering recommendation algorithm based on fuzzy clustering.The details are as follows:1.Aiming at the problem of dividing the density peak clustering algorithm by Euclidean distance measure,We propose a density peak clustering algorithm based on shared neighbors distance.Based on the concept of adaptive similarity and shared nearest neighbor,this paper proposes a measure of similarity between shared neighbors and applies it to the DPC algorithm.The comparison experiments on UCI data sets and artificial data sets show that the DPCND algorithm uses the shared nearest neighbor similarity to measure the distribution characteristics of the complex structure data set more objectively,and improves the accuracy of clustering.2.In order to solve the problem that the fuzzy c-means algorithm(FCM)'s clustering result depends on the initial parameter setting,this paper optimizes the strategy of selecting the center point by DPC algorithm to optimize it and proposes a fuzzy C-means clustering algorithm based on density peak(DPCND-FCM).DPCND-FCM algorithm uses DPC algorithm to select the center point,then iterates clustering through FCM algorithm.It selects the center point more accurately and reduces the number of iterations.The experiments of UCI data sets and artificial data sets show that the center point of the DPCND-FCM algorithm is more accurate,and it's clustering effect is also better.3.Aiming at problem of high time complexity when DPCND-FCM algorithm processes large data sets,combined with the characteristics of MapReduce parallel computing model Map,such as row reading data,Shuffle sorting,and Reduce merge calculation,we propose a MapReduce based DPCND-FCM algorithm(Mr DPCND-FCM).In the Hadoop environment,the MrDPCND-FCM algorithm performs three MapReduce jobs in sequenc,completes the selection of the initial center point,and carries out the FCM iterative clustering through the Job cycle of the MapReduce operation,thus realizing the clustering in the distributed environment.On the Hadoop platform,we test the single node and cluster performance on the UCI USCensus1990 raw large scale data set.The test result show that the MrDPCND-FCM algorithm has better acceleration ratio and scalability.4.Aiming at the problem of data sparsity and scalability in CF algorithm,a distributed collaborative filtering recommendation algorithm based on DPCND-FCM clustering(MrDFCF)is proposed.Firstly,the DPCND-FCM fuzzy clustering algorithm is used to cluster the user project score matrix of the CF algorithm.The fuzzy clustering is more in line with the real world,It enables target users to belong to different user clusters,MrDF-CF filter out the user class with too low membership and get the candidate user set.The algorithm adopts user based collaborative filtering recommendation algorithm on the candidate user set,and the membership degree is added as a weight to the prediction formula to recommend.Aiming at the scalability of the algorithm,we design the algorithm flow of improved collaborative filtering algorithm in distributed environment.Through the test on the MovieLens data set,the analysis and contrast results show that MrDF-CF has a higher recommendation quality compared with the traditional CF algorithm,and also has better scalability.5.This paper designs and implements a Book Recommendation System Based on Hadoop.This system has realized the main functions of book recommendation,algorithm modeling,algorithm evaluation and so on.The whole system is divided into two parts: the book recommendation subsystem and the big data processing subsystem.when the test data set and selection of the recommendation algorithms are inputed from the user front end,the big data processing subsystem will executes algorithm program,and the outputting results are put back to the front end.The result of the system's algorithm evaluation shows that the MrDF-CF distributed recommendation algorithm proposed in this paper can recommend personalized books for users,and also has better performance indicators.
Keywords/Search Tags:density peak clustering, fuzzy C-means algorithm, Hadoop, MapReduce, collaborative filtering recommendation algorithm
PDF Full Text Request
Related items