Font Size: a A A

Research Of Collaborative Filtering Algorithm Based On Hadoop Cloud Platform

Posted on:2015-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiFull Text:PDF
GTID:2308330482455611Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays, the rapid development in E-commerce has changed the way people live a lot, we can get what we want just at home by sitting at the computer or using a terminal which can connect into the Internet. But the explosion of information which gives us a lot of choices also increases the difficulty of choice. We need to filter out irrelevant information from the mass of information and have specific attitude of our favors. The recommendation system is born in this background. Its functionality is to dig out the needed potential commodity for us based on our hobbies and interests.Some enterprise institutions and scholars who search on the Data mining use different methods to realize the recommendation system. Collaborative Filtering is one of most widely used method at present, also in the field of electronic commerce. So this dissertation takes collaborative filtering as the key point of research.At first we propose a modified method which used non-negative matrix factorization based on the research of traditional collaborative filtering algorithm. The method mine the dimension of potential user attributes, and catch the users’hobbies and interests in this dimension. It is more reasonable to choose the object by the similarity compute. It guarantees that hobbies and interests of user who are in the most closed collection and active user. Proved by the experiment, the improved algorithm has more exact predicted value the traditional collaborative filtering algorithm in MoiveLens data set. Moreover, in order to solve the problem of user interest shifting in the collaborative filtering, this article proposes a improved algorithm based on delta-update. The algorithm only compute the delta factors in the updated data, cache the middle data in order to decrease the calculated quantity.Proved by the experiment, in the same environment, the improved algorithm has shorter run time than traditional collaborative filtering algorithm in Netflix data set.Cloud computing and big data is one of the hottest topics in today’s Internet community. It’s considered as the core direction which will lead the next revolution of Internet. Because of big data, cloud computing appeared and it has a very powerful calculation and storage capabilities. This dissertation ponders how to use the advantages of cloud computing to solve the "big data" issues in the traditional collaborative filtering algorithm, which is the expansibility. For this purpose, this paper adopts Hadoop, which is an open-source project of Apache foundation, as the core algorithm of the cloud development platform. In order to make the above improved algorithm implement parallelization in the Hadoop cloud environment, the dissertation do a deep research on its distributed file system HDFS and MapReduce paradigm. Both of the two stages for the improved algorithm implement parallel processing. Finally, the experiment adopts the large-scale Netflix data set to compare with the traditional collaborative filtering algorithm on the aspect of accuracy and run time. The experimental result shows that the performance of improved algorithm is better than traditional collaborative filtering algorithm on both aspects.
Keywords/Search Tags:collaborative filtering, non-negative matrix factorization, delta-update, cloud computing, Hadoop
PDF Full Text Request
Related items