Font Size: a A A

Design And Implementation Of Recommender System Based On Hadoop Platform And Spark Framework

Posted on:2019-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2428330590962948Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the problem of information overload has followed.One of the ways to solve the information overload is to use the recommendation system to make personalized recommendations for users.The traditional recommendation system has poor scalability and long calculation time,and encounters computational and storage bottlenecks under massive data.How to recommend massive information to a large number of users in a short period of time becomes a serious challenge for the recommendation system.Using the hadoop platform can solve the problem of massive data processing in the recommendation system.This paper designs a mobile app application recommendation system based on Hadoop platform and Spark framework.The system combines project-based collaborative filtering algorithm and logistic regression algorithm to transfer user access data from relational database to Hive data warehouse for data processing.First,in Hive,through a series of operations such as creating intermediate tables and calling Python script files,including sample and feature extraction and building training data,the data is processed into the entry data of algorithm mathematical modeling.Then,according to the product characteristics and user behavior,the logistic regression algorithm is called in the Spark computing framework to do mathematical modeling,and the abstract concept of the user's preference is embodied,and the model file is generated by calling the logistic regression algorithm.Finally,the generated model file is placed into the online Dubbox project,and recommendations are made according to the recommended model.The project-based collaborative filtering algorithm can find the association between two items,and is a commonly used recommendation algorithm in the industry.Logistic regression algorithm is a two-classification algorithm.The algorithm input is feature and the output is feature weight.Combined with collaborative filtering algorithm and logistic regression algorithm,users can be personalized.This paper systematically evaluates the three indicators of the processing performance,accuracy,and AUC of the main evaluation system.Experiments show that the proposed system combined with logistic regression and collaborative filtering algorithm has obvious advantages in cold start problem.
Keywords/Search Tags:Logistic regression, Hadoop, Collaborative filtering, Spark
PDF Full Text Request
Related items