Font Size: a A A

The Recommendation Algorithm Based On Hadoop

Posted on:2016-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:J F ZhangFull Text:PDF
GTID:2308330503450751Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the technology, Internet is increasingly close to people’s lives. More and more people choose to obtain information from the network. But with the rocketing development of our internet, information in the network is exploding. It becomes more and more difficult to get the result that users want to get from such a big data. Today is the era of information overload.In order to extract the useful information in the era, search engine was invented, and then achieved success. But because of the keyword of search engine is difficult to summarize and the result lack of personalized elements, search engines can not meet the requirements of users. Recommendation system is regarded as a more excellent tool than the search engine in information filtering solution. Recommendation system is more intelligence and more initiative than search engine.Recommendation system does not need keyword what the user offered. It can gives calculation results of the user preferences by calculating historical records. But recommendation system is faced with many problems. One of them is that the traditional recommendation algorithm is designed for single machine running. As the generation of huge data and the arrival of the machine performance bottlenecks, the traditional recommendation algorithm has become increasingly unable to meet the needs of actual production environment. Single store extensibility and calculation extensibility of the algorithm are severely restricted the development of the recommendation system.The concept of big data in recent years is more and more popular. Big data processing platform also emerge in endlessly, one of the most widely used platform is the Hadoop. Hadoop provides users many components include HDFS and Map Reduce. Hadoop can promote cluster storage and computing power through adding nodes. It can solve the problem of scalability of recommendation system. In this paper, on the basis of in-depth study of HDFS and Map Reduce, proposed parallel implementing probabilistic spreading algorithm using Map Reduce programming mode. Probabilistic spreading algorithm belongs to Network-based recommendation algorithm which is proposed these years. This paper also gives parallel implementation of one kind of item-based collaborative filtering algorithm and global sorting algorithm. The main work of this paper is:1.Through analysis and study of three algorithms, the complex computing tasks are decomposed into a series of Map Reduce job flow for distributed parallel processing on Hadoop. The speedup of parallel probabilistic spreading algorithm is also gived. Prove its parallelization has good expansibility, solve the scalability problem of calculation of the traditional single algorithm. Besides that, this paper compares the hitting rate of three algorithms.2.The cluster includes eight virtual machine nodes. It was setted up in the four computers and deployed Hadoop in it. HDFS can simply expand the storage capacity by adding nodes, solve the problem of the recommendation system storage extensibility.3.By analyzing the job log, find out the main bottleneck of the time-consuming job. Analyzes the operation principle of Map Reduce, identify possible optimization steps. A series of optimized parameters are given. It reduces the running time of Map Reduce and reduces the possibility of failure of a single task.
Keywords/Search Tags:Map Reduce, HDFS, Big Data, Hadoop, Recommendation system
PDF Full Text Request
Related items