Font Size: a A A

Application Research Of Recommendation Algorithm Based On Spark

Posted on:2020-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z D LiFull Text:PDF
GTID:2428330596978789Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of big data technology,Spark computing framework has become more and more popular,with a wide range of application scenarios.And the increase of Internet users brings explosive growth of data on the network.The analysis,mining and feature extraction of this huge amount of data has become a very important research direction.At the same time,with the development of artificial intelligence,machine learning and deep learning algorithms are becoming more and more intensive.These learning algorithms continuously use data to train and optimize the model,and finally achieve the corresponding purposes of classification or clustering.Recommendation system is a kind of classification system,because it divides the project into several parts,such as user's likes and dislikes.,which extracts useful parts according to a large number of user's operation data in the network,and then recommends the application of corresponding projects to users.It takes a lot of time and system resources to implement the recommendation algorithm in statistical computing mode,because it needs to iterate over a large number of data,so it produces a lot of intermediate data,involving the storage of intermediate data.In order to solve this problem,people begin to combine the recommendation algorithm on the distributed computing platform.The emergence of Spark satisfied the parallel operation of recommendation algorithm.Spark proposed and introduced an abstract data set called RDD to ensure high fault tolerance of data.At the same time,the underlying design is memory-based computing,so that the intermediate results of the iteration process can be stored in memory for the next iteration,instead of repeatedly reading and writing disks,which greatly saves computing time.By comparing and studying the application of recommended algorithms in Spark computing framework at home and abroad in recent years,it is found that the efficiency of the algorithm has been greatly improved on the Spark platform.This paper studies the related technologies of recommendation algorithm on Spark platform,mainly including the following aspects:Understanding the construction and calculation principle of Spark computing framework,and implementing content-based recommendation algorithm,user-based collaborative filtering recommendation algorithm and ALS matrix decomposition-based collaborative filtering recommendation algorithm on the Spark platform.Design the parallel implementation of combinatorial recommendation method,and analyze the advantages and disadvantages of each algorithm in detail as well as the specific implementation process.Details of the algorithm are optimized.For content-based recommendation algorithm,we design a pre-processing of data features to calculate the similarity of items more conveniently and reliably;for user-based collaborative filtering recommendation,we add user's potential interest similarity when calculating the similarity,that is,the similarity of items that users with similar scores have seen.The experimental results show that the implementation efficiency of the recommendation algorithm on Spark platform is greatly improved.The strategy of combined recommendation algorithm makes the accuracy of recommendation results better than that of independent recommendation results of each method.At the same time,it is helpful to optimize the algorithm and preserve the features,which improves the accuracy of the recommendation results and other indicators.
Keywords/Search Tags:Recommendation Algorithm, Collaborative Filtering, Alternating Least Squares, Hadoop, Spark
PDF Full Text Request
Related items