Font Size: a A A

Research Of Recommender Algorithm On Cloud Platform

Posted on:2015-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:C Z ChengFull Text:PDF
GTID:2298330431477047Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the explosive growth of datais not making people any easier to find the information they need. Both the industry andacademic world have made innovative progress to help solve the problem of informationoverloading. Among others, recommender system is a powerful tool to offload the hardwork of information screening by providing personal content recommendation. Nowadaysrecommender system has become an important integrated part of many Internetapplications. With the fastest-growing of users and content, recommender system now hasto face the challenge of providing accurate recommendation by analyzing massive amountof data generated by the interaction of user and content.Hadoop is a distributed platform and the mainstream choice of cloud computing. Itmakes the storage and utilizing of big data convenient and provides parallelization ofcompute clusters of any size. Thus the key technology MapReduce on Hadoop has becomean important paradigm in solving large scale machine learning problems. Machine learningis the subject of using probability models and statistical methods to analyze historical dataand make predictions for future data input. These new innovations of technology isencouraging us to consider new ideas of building recommender systems. This thesisstudies the distributed computation technology of Hadoop platform and employs the theoryand algorithms from machine learning to carry out the research on recommender system.Our work can be outlined as follows. First we study the classic algorithms ofrecommendation and compare their accuracy. Their difficulty in scaling to big data is alsodiscussed. Second we propose a feature learning algorithm based on the linear regressionmodel. The algorithm learns features of contents and users from ratings given by every user.The high dimensional vectors that represent the features are used to predict unknownratings, which are then used as metric for generating personal recommendations. Becausethe feature learning algorithm may have to deal with millions of users and items, it mayinvolve training parameters of orders of billions. This problem is addressed by introducingthe MapReduce paradigm on Hadoop to parallelize the feature learning algorithm.When applying the feature learning algorithm to the Movielens data set to predictmovie ratings, we obtain better prediction than the traditional collaborative filteringmethod which is based on similarity of user and content. The result shows that the linearcombination of user and content feature vector is an effective way of generating unknown ratings. Further more, the prediction gets more accurate as the vector’s dimension increases.In order to take advantage of cloud computing, the revised data set format works togetherwith the MapReduce framework. Experiment shows the feasibility of the algorithm’sparallelization, so it’s possible to attain efficiency improvement when dealing with massivedata set for recommendation.
Keywords/Search Tags:recommender system, feature learning, collaborative filtering, cloudcomputing, MapReduce
PDF Full Text Request
Related items