Font Size: a A A

Design And Implementation Of Recommender System Based On Hadoop

Posted on:2014-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z TangFull Text:PDF
GTID:2268330401967108Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With rapid development of Internet technology, recommender system has becomethe most important filter tool to solve information overload problem, and help users tofind the useful content to them from massive data quickly and high effectively. But inpractical applications, types of items and number of users may be very large, thegeneration and accumulation speed of log information in system continue to growrapidly, and traditional recommendation algorithm used to run on single machine, andlimited by its performance, it’s not far enough to meet the needs of recommendationcomputation on mass data.In order to solve the scalability problems of recommender system, we haveproposed several system solutions based on distribute computing open source softwareframework Apache Hadoop. On the basis of in-depth study of distribute system HDFSand programming ideas MapReduce, in this thesis we have put forward severaldistributed parallel implementation based on MapReduce programming model toNetwork-based recommendation algorithm which is proposed these years. On this basis,we have designed and implemented a recommender prototype system based on Hadoop.The main work contents are as follows:1. Study running mechanism of Hadoop and programming frameworkMapReduce, combine in-depth analysis of recommender system and recommendationalgorithm, especially Network-based recommendation algorithm which is representedby probabilistic spreading algorithm and heat spreading algorithm, and designMapReduce programming implementation solution of Network-based recommendationalgorithm on Hadoop. The complex computing tasks are decomposed into a series ofMapReduce job flow for distributed parallel processing on Hadoop and cloudcomputing platforms; the experimental tests have shown that the algorithm has goodparallelism and scalability in cluster.2. Based on MapReduce solution for Network-based recommendation algorithm,we have used optimization method such as combiner function and sequence file I/Otype, analyzed long tail distribution of dataset and implementation details of algorithm computation process, and proposed improvement MapReduce solutions ofNetwork-based recommendation algorithm on Hadoop by optimization ideas, forinstance, Pair and Stripe in the resource allocation matrix computing process, andcut-down to extremely active users. Proved by experimental tests that the improvedmethod can further improve the efficiency of the algorithm.3. Study installation, deployment and usage of related open source softwareinclude Hadoop, Mahout, Sqoop and Ganglia etc, combine with MapReuce solution ofNetwork-based recommendation algorithm designed in the thesis, after several steps ofsystem requirements, system design for framework and process, system implementationand system testing, design, implement and run a set of recommender prototype systembased on Hadoop in cluster environment consisting of multiple computers.
Keywords/Search Tags:Hadoop, recommender system, MapReduce, Network-basedrecommendation algorithm
PDF Full Text Request
Related items