Font Size: a A A

Research On Genre-Based Hybrid Collaborative Filtering Algorithm

Posted on:2016-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2308330461984240Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and electronic commerce, explosive growth of information has brought us the confusion on how to choose from an overwhelming set of choices. Recommender systems are important mechanisms to help users deal with the information overload problem.Collaborative Filtering(CF) is one of the most popular recommendation algorithms both in the research and commercial areas. The biggest advantage of Collaborative Filtering is that it requires no previous knowledge of item content and can handle unstructured items, such as music, movies, and photos. However, it often suffers from the sparsity problem.Aiming at alleviating this issue, we propose a novel CF algorithm named Genre-based Hybrid Collaborative Filtering (GHCF). Our algorithm improves the recommender accuracy in the following three crucial aspects:First, we use the genre information to classify items and build item-genre matrix. Then we build a user-genre matrix based on user-item matrix and item-genre matrix. This process reduces a large-dimensionality space into a smaller-dimensionality space. Second, we adjust the users’(or items’) similarity computation formula by adding correlation weight based on the number of co-rated genres (or the number of users who rated both items). The adjusted similarity calculation overcomes the potential inaccuracy caused by the sparsity problem. Third, we address a new missing data prediction strategy by using item-based CF to fill vacancies when the rating of target user’s neighbor is missing during user-based CF process. Last but not least, we build a user Recently Interested Genre Cloud (RIGC) for each user in order to tracking their interest more accurate,In our paper, We use an extension of MovieLens10M dataset as data source. Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are employed to measure the prediction quality of our approach with others. The parameters are obtained using cross validation. Experiment results show that our proposed algorithm GHCF provided significantly improvements over baseline algorithm:User-based Collaborative Filtering using PCC (UPCC) and Item-based Collaborative Filtering using PCC (IPCC) on all occasions. The maximum MAE improve rate reach 22% and the maximum RMSE improve rate is 28%. So we believe that our algorithm GHCF can effectively relieve the sparsity problem and effectively improve prediction accuracy.
Keywords/Search Tags:recommendation system, data sparsity, hybrid collaborative filtering, missing data prediction
PDF Full Text Request
Related items