Font Size: a A A

A Research And Implementation Of Recommender System Based On Mahout And Hadoop

Posted on:2017-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:G X SongFull Text:PDF
GTID:2308330488468504Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet, e-commerce, as its representative one, data and information increase explosively. It makes difficulty for us to choose the real need target from a large quantity of items. To meet this demand, detailed study on recommender system that plays an increasingly important role in today’s society will have a greater practical significance. Improving the accuracy of recommender system is not only for reaping huge economic benefits, but also for the users of the system with more personalized and convenient services.Collaborative filtering algorithm in recommender system has broad and successful applications. But such an algorithm’s performance is not satisfactory in scene of sparse data. From the start with the basic concepts of recommender algorithm, discussed a number of different ways to calculate the similarity of collaborative filtering algorithms, and proposed a new similarity measure using Bhattacharyya coefficient. Experiments on open source data published by MovieLens, Netflix and Yahoo Music verify the validity of the new way to calculate similarity in collaborative filtering algorithm. Recommender system, as a data-intensive system, is prone to explosive data growth, the paper also analyzed calculation principle of Hadoop distributed computing framework, as well as the part of recommender algorithms in well-known machine learning framework Mahout are discussed in detail, and told its convenience about implementation of collaborative filtering algorithm using Bhattacharyya coefficient as its measure of similarity. At last, we discussed the principle of combining these two frameworks.Finally, a systematic design and prototype realization are given. Specific introduction of collaborative filtering algorithm based on Bhattacharyya coefficient’s implementation process based on Mahout, and showed the source code. By the inevitable demand of long-running system, scheme and steps of system’s migration to Hadoop distributed computing framework were given. Combined with Mahout and Hadoop, the system can solve the problems of big data’s storage and computation well.Concluded, the innovation of this paper is mainly reflected in the following two points:1) Due to the collaborative filtering algorithms’ relying on common rated data, recommender system’s results are not accurate enough in sparse data. We proposed a new similarity measure using Bhattacharyya coefficient to solve this problem. Experiments on open-source data proved the validity of the new way in sparse scene.2) To make the collaborative filtering algorithm using Bhattacharyya coefficient can be put into practical application, we implemented this algorithm based on Mahout framework, the source code of the key steps were given.
Keywords/Search Tags:recommender system, collaborative filtering, Bhattacharyya coefficient, sparse data, Mahout, Hadoop
PDF Full Text Request
Related items