Font Size: a A A

Design And Implementation Of Recommendation System Based On Spark

Posted on:2022-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2518306785975619Subject:Trade Economy
Abstract/Summary:PDF Full Text Request
With rapid development of internet technology,the data produced and faced by internet users continue to increase,which makes people face the dilemma of the "information ocean".Therefore,the recommendation system has emerged as the first choice to help users filter effective information from the massive amount of information.However,in the actual application process,due to the high-dimensional sparseness of the original data,one-sidedness in calculation of user or item similarity,and the poor real-time performance of the recommendation results.So,the user experience is not good.In order to better optimize the recommendation system,this article improves the system architecture and recommendation algorithm of the recommendation system based on big data processing technology.In the recommendation system architecture,based on the distributed system architecture Hadoop,combined with the distributed storage system HBase and the big data computing platform Spark,a recommendation system based on the big data platform is constructed.In terms o f recommendation algorithms,an improved collaborative filtering recommendation algorithm is proposed by comprehensively analyzing the advantages and disadvantages of each recommendation algorithm and application scenarios.The following aspects are included in the main content of this paper.Firstly,Researching on recommendation system and big data processing technology.This article discusses in detail the related theories of the recommendation system in terms of system architecture and recommendation algorithm,and provides a theoretical basis for the system design and algorithm improvement later.Then,the two most popular big data processing frameworks,Hadoop and Spark,are deeply analyzed and compared.Combining with the characteristics of the recommendation system itself,the strengths of the two big data processing frameworks are used to design a big data recommendation system with Spark on Yarn as the computing core.The system has the advantages of high availability,fast calculation and low maintenance cost.Secondly,Optimizing of data acquisition system.This article uses Flume with a custom source component to collect offline data,and Nginx directly connects to Kafka to collect real-time data.The two collection lines provide a stable,fast and easily expandable data collection system for different data sources of the recommendation system.Thirdly,Analyzing of Spark's core components and improvement of collaborative filtering algorithm.First,the comprehensive use of Spark Core,Spark SQL,Spark Streaming,and MLlib is discussed in actual application scenarios.Then the recommendation system based on the improved collaborative filtering algorithm is discussed in depth.Through multi-dimensional analysis of user data,the collaborative filtering algorithm based on the F2-LSHT-CF model is established and implemented in conjunction with Spark API.Fourthly,a recommendation system based on Spark on Yarn is built to realize data collection,data storage and data analysis.First of all,it inputs the Audioscrobbler,the experimental data set of the music recommendation system,to the source of the data acquisition system.Then,Spark,the computing core of the recommendation system,continuously pulls the collected data and uses Spark Streaming to complete data exception handling and format regulation.Finally,the regular data is input into the improved collaborative filtering algorithm to calculate the recommendation results and make recommendations for users.The recommendation system has good applicability and recommendation performance for most business scenarios,can provide a basis for Internet companies to achieve precision marketing,and provide certain guidance for personnel who followup research on the recommendation system.Combined with the performance of the recommended system in this article,it can be seen that based on the Hadoop distributed system architecture,combined with the big data application solution built by the big data computing platform Spark,it can easily deal with the storage and calculation of massive data.The collaborative filtering algorithm based on the F2-LSHT-CF model can realize the recommendation service more accurately.
Keywords/Search Tags:recommended system, hadoop, spark, collaborative filtering algorithm, data collection
PDF Full Text Request
Related items