Font Size: a A A

Research On Collaborative Filtering Recommendation Algorithm Based On User Reviews And Clustering

Posted on:2020-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiFull Text:PDF
GTID:2428330575477783Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recommender system uses the recommendation algorithm to predict the user's score on the unrated item by mining the known interaction information between the user and the item,and recommends the item with the higher predicted score as the item that the user may be interested in,let users quickly find their favorite items in the information overload today,but also bring profit to the company.Among many recommendation algorithms for predictive scoring,collaborative filtering algorithm has become the most widely studied and applied predicting algorithm by fully exploiting the similar relationship between users and users,items and items to predict users' scores on unrated items.The early collaborative filtering algorithms only considered scores,which made the collaborative filtering algorithm mine too little information during predicting the score,resulting in low accuracy of the scoring prediction,and the prediction results were not reasonably explained.With the increasing richness of user reviews information on the network,more and more researchers have added this information to the collaborative filtering algorithm model,trying to improve the accuracy of scoring prediction,and at the same time make the recommendation results have certain explanatory,which make the collaborative filtering algorithm based on user reviews information become research hotspot in the field of recommendation systems gradually.However,most of the collaborative filtering algorithms based on user reviews information use various methods to connect the user's rating with the comment information,but in the relationship between the user's items,users and items are regarded as independent individuals without association,and excavation of commonality among objects is lacking.In real applied scenario,users under the same category of products can be grouped into groups according to their respective preferences.Users in each group have similar preferences,and users in different groups have different preferences.Similarly,items under the same category of products according to their respective attributes,can also be grouped into groups.The items in each group have similar attributes,and the items in different groups have different attributes.Therefore,in the collaborative filtering algorithm,the similarly similar users and the items with similar attributes should be gathered together to form a plurality of sub-groups,which not only make the predicted score be in line with the actual situation,but also can further improve the accuracy of the score prediction.There are two situations in which the user and the item cluster are different according to the type of the product.In the first case,the user's preference complexity is very different from the item's attribute complexity.In the second case,the difference of the user's preference complexity and the item's attribute complexity is very small.This thesis focuses on these two situations,and the specific work done is as follows:Firstly,the matrix factorization recommendation algorithm based on latent Dirichlet allocation(LDA-MF)is proposed for the types of commodities whose complexity of user preference is very different from the complexity of the attributes of the item.The algorithm combines the latent Dirichlet allocation model for user reviews mining and the matrix factorization algorithm model for scoring prediction.Firstly,the user is clustered,then the rating matrix is segmented according to the clustering result,and then the score of the same category user is placed in a sub-matrix for parameter training of the matrix factorization model,and finally the user's score for unrated items is predicted.Through the algorithm experiments on the eight data sets published on Amazon website,it is verified that the LDA-MF algorithm has a significant improvement in the accuracy of the score prediction compared with the comparison algorithm.In addition,this paper extracts the user feature words based on the results of the LDA-MF algorithm experiment,and explores the interpretability of the recommendation results based on these words,which verifies that the LDA-MF algorithm model can produce interpretive recommendation results.Secondly,according to the types of products whose user complexity is very different from the complexity of the attributes of the items,simultaneously clustering users and items,recommendation algorithm based on User-Item Co-Clustering is proposed.The algorithm uses the ideas of Bayesian network and topic model to model,and the users and the items are clustered into a group by the user's rating of the item and the information of the comment.Each group has its own topic distribution of specific comment and score distribution,and finally the prediction score is generated by multiple sampling based on Bayesian posterior probability.Through the algorithm experiments on the three data sets published on Amazon website,it is verified that the UICC algorithm has a significant improvement in the accuracy of the score prediction compared with its comparison algorithm.In addition,the user and the item are clustered and grouped according to the results of the UICC algorithm experiment,and then using matrix factorization algorithm in each group to perform the score prediction.The prediction results are compared in the ungrouped case.It is verified that the UICC algorithm model can be used as an algorithm framework to improve the accuracy of scoring prediction of matrix factorization algorithm,and has certain scalability.The two algorithms proposed in this paper have different characteristics in addition to the different types of products.The LDA-MF algorithm has a simple structure and a fast solution speed.It is suitable for fast prediction scoring and analysis for large data sets.As for UICC algorithm,there are many factors in it,and for groups which users and items belong to,the method of probabilistic prediction is adopted.It is suitable for more detailed prediction and analysis of data sets with smaller data scales,and can be used as an algorithm framework.The nature and characteristics of the two algorithms have complementary relationships and selecting different algorithms in different situations can ensure the efficiency of the algorithm and the accuracy of the score prediction.
Keywords/Search Tags:Recommender system, Text Mining, Collaborative Filtering, Matrix Factorization, Clustering
PDF Full Text Request
Related items