Font Size: a A A

Parallelization Research On Collaborative Filtering Algorithm Based On Cloud Computing

Posted on:2014-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiFull Text:PDF
GTID:2248330398976769Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, the data on the network is being rendered explosive growth. The condition of excessive information forces the user to find useful information from the ocean of information which spends more time and energy.In this context, the recommendation system is invented to help users find the interested information. Currently, the popular recommendation system is collaborative filtering. The algorithm uses the interest similarity between users to make recommendations on the user’s preference information. However, with the growth of the data, the computational efficiency of the collaborative filtering algorithm becomes more and more inefficient. Based on this situation, the article uses parallel computing to investigate and study computational efficiency of collaborative filtering under the experimental condition of large data.Cloud computing seen as the development of parallel computing technology, can effectively solve complex computational efficiency. Currently, the popular cloud computing platform is Hadoop, and this article uses it as the implementation platform. In the Hadoop platform, to implement parallel computing to achieve collaborative filtering, the key is to solve the data correlation in the calculation process. Restricted Boltzmann Machines model and k Nearest Neighbours model are taken for example. on the basis of detailed analysis of the calculation process, the algorithm based on Hadoop platform is proposed. According to the characteristics of the MapReduce framework, the algorithm splits the calculation process into a number of tasks. In each task, the data replication is assigned to each computing node with data redundancy mechanism which solves the data correlation in the calculation process. Meanwhile, in the calculation process of a plurality of tasks, each task depends on the relationship of the front and rear. When MapReduce splits collaborative filtering into multiple tasks, the algorithm uses dependencies modular MapReduce to implement parallel computing which solves the dependencies between tasks.Finally, we use experiments to verify the above algorithm. In the experiments, the comparative analysis between Hadoop platform implementation and the previous implementation draws the conclusion that the Hadoop platform improves the computation efficiency of the nearest neighbor recommendation and Restricted Boltzmann Machines under conditions of large data sets.
Keywords/Search Tags:Collaborative filtering, K Nearest Neighbors, Restricted Boltzmann Machines, Parallel processing, Cloud computing, Hadoop
PDF Full Text Request
Related items