Font Size: a A A

Research And Implementation Of Collaborative Filtering Algorithm In Cloud Computing Environment

Posted on:2017-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330488487612Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet, a large number of information data are produced. Personalized recommendation helps users to filter out the demand data from the mass data. In order to make the user get a better recommendation, how to optimize the recommendation is a hot research topic in recommendation. The collaborative filtering algorithm is a commonly used technology in recommendation, and it can facilitate users to obtain more accurate and personalized recommendation results. The sparse data of the recommendation algorithm leads to the poor recommendation effect of collaborative filtering algorithm, and the scalability problem causes the difficulty of the traditional single machine operation. Therefore, this thesis proposes a new hybrid collaborative filtering algorithm to solve the problems in the cloud computing environment.In the proposed hybrid collaborative filtering algorithm, the thesis improves the memory-based collaborative filtering algorithm. The first is to improve the similarity calculation method of the Pearson correlation coefficient. The Pearson correlation coefficient is commonly used in the traditional collaborative filtering algorithm as the similarity calculation method. But it exists a problem: when the number of the common rating items is smaller, the Pearson coefficient becomes larger. To solve this problem, there is the ratio of the number of the common rating items and the maximum number of user rating items. The thesis uses the ratio as weight to correct the Pearson correlation coefficient. Secondly, the thesis introduces a parameter ? in the algorithm for improvement. In the case of sparse data, the nearest neighbor set exists such a situation: the number of the common scores between the two users or items is very small, and the maximum number of the scores is far greater than the number of the common scores in the two users or items. In the recommendation algorithm, some redundant and reliable scores will be obtained. So the ? is the ratio of the number of the common rating items and the maximum number of user rating items, as to determine whether the neighbor is the nearest. Finally, the hybrid collaborative filtering algorithm framework is designed that the item-based recommendation results fills in the user-based prediction with the judgment of the inadequate nearest neighbors.In addition, the thesis also implements the distributed collaborative filtering algorithm. The Hadoop cloud computing platform is a framework for distributed processing of large amounts of data, and it is reliable, efficient and scalable. The Hadoop cloud computing platform can support the recommendation algorithm to calculate mass data. But the MapReduce programming model of the Hadoop is different from other traditional programming forms. Therefore, the distributed implementation of the algorithm is divided into a series of MapReduce processes. It is divided into three modules. They are the data set preprocessing module, the user-based algorithm module and the item-based algorithm module. The hybrid collaborative filtering algorithm achieves a better distributed algorithm through the integration of the three modules.In order to verify the algorithm's performance, it uses the movielens data set that the Grouplen provides and the data set of the Netflix competition as experimental datas, and it makes the three evaluation standards of the mean absolute error, the precision and the coverage as a comprehensive index. The experimental results show that the hybrid collaborative filtering algorithm has better effect than the traditional collaborative filtering algorithm in the two aspects of accuracy and individuation.
Keywords/Search Tags:Collaborative Filtering Slgorithm, Cloud Computing, Hadoop, Similarity, Distributed
PDF Full Text Request
Related items