Font Size: a A A

An Item-based Collaborative Filtering Recommendation Algorithm Optimization And Parallel Implementation On Spark Platform

Posted on:2018-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2348330542961691Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and widespread popularity of the Internet technology,resulting in massive amounts of the Internet data.In the era of the big data,both the information consumers and the information producers are facing the great challenge of data overload,how to obtain valuable information from the big data is a significant research topic.In this context,collaborative filtering is one of the key technologies to mitigate information overload in real projects and research,and now some large e-commerce and information release sites have begun to use collaborative filtering technology.But cold start and the low prediction accuracy still exist,while the face of increasing data volume,the traditional stand-alone exists on the performance bottleneck in data storage and calculation,and can not meet the user's current needs,the recommendation algorithm working with the distributed computing platform provides a new idea to solve this problem.There are lots of distributed computing platforms,where the MapReduce computing model of Hadoop only provides two operations of Map and Reduce,the I/O is computationally expensive in calculating iterative tasks.Spark platform provides the abstract resilient distributed datasets and the operation model of relying on memory,just make up for the deficiency of the MapReduce,and make it better adapt to mass data computing scenario,become the research focus of the big data processing direction.This paper mainly studies the traditional item-based collaborative filtering recommendation algorithm,introducing item attributes similarity with the user rating similarity combination way to improve the algorithm,to ease the project of cold start problems due to the sparse data,on this basis,through the introduction of user behavior time information on the algorithm is further improved.Finally,the algorithm is designed and implemented in parallel on the Spark platform to improve the algorithm's parallel operation efficiency.The main research in this paper mainly includes the following aspects:1.This paper analyzes the principle,algorithmic process and existing problems of the proposed item-based collaborative filtering algorithm,2.In this paper,we propose a method to combine the similarity between the item attribute and the similarity of the user's item score to alleviate the problem of the cold start problem caused by the data sparse case and the problem of user's interest decay with time.And the time and environment information of the user behavior is modeled into the recommended model to improve the recommendation effect.The experiment is better than the traditional algorithm,and the recommended effect is better.3.The optimized algorithm based on item-based collaborative filtering is implemented on the Spark computing platform.The experimental results show that the parallel algorithm implemented on Spark is better in parallel performance,and the algorithm can solve the extensibility of large data sets,To improve the parallel operation efficiency of the algorithm.
Keywords/Search Tags:big data, Spark, parallelization, collaborative filtering recommendation, personalized recommendation
PDF Full Text Request
Related items