Font Size: a A A

Collaborative Filtering Algorithm Research Based On Page Interest Degree

Posted on:2010-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360272999233Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the popularization of Internet and the rapid development of E-commerce, information overload made it hard for consumers to find the information they wanted within a mass of information.Researchers and enterprises have paid more attention to how to meet the consumers' needs fast and accurately.The recommendation system is an effective method to solve the problem of information overload.The recommendation system is an intelligent agent system,it can find the information which meet the interests of consumers automatically.In fact,the recommendation systems are proposed to suggest product and to provide consumers with information to help them decide which products to purchase.Under the increasingly fierce competition,E-commerce recommendation systems can enhance E-commerce sales by converting browsers into buyers,increasing cross-sells and building loyalty to prevent users losing effectively.Due to the functions of the recommendation system,the recommendation systems have been very successful in both research and practice.But with the expansion of the E-commerce and the increase of the number of consumers and information,the recommendation system suffers from a lot of problems,such as data sparsity,cold-start,and scalability.Those problems affect the performance of recommendation badly.Aimed at the main challenges of recommendation systems,this paper explored and researched the recommendation systems' key recommendation technology,that is collaborative filtering algorithm.The main research works in this thesis are as follows:1.Analyze the key recommendation technology—collaborative filtering algorithm(CF).CF is the most successful E-commerce recommendation algorithm. The basic idea of CF is to predict how a user would rate a given item from other user ratings.Based on the different algorithms of CF,CF can be divided into two categories:memory-based CF and model-based CF.The nearest neighbor search (KNNs) is the key technology of CF.Almost all of the collaborative filtering algorithms use KNNs.The traditional CF can be divided into three steps:form the rating matrix,KNNs,recommend or predict the rating.After the analysis of CF,we found that with the ever-increasing of the complexity of the web site structure and the number of users,CF suffers from a lot of challenges,such as data sparsity,cold-start, new user.2.Aimed at these challenges,this paper put forward a novel CF algorithm based on the page interest degree.First of all,aimed at the problem of data authenticity,we get the initial page interest degree matrix from the web log(implicit data source) to avoid the "mendacious rating" of explicit rating of users.This paper analyzes the factors which affect the page interest degree,classifies and sums up these factors,and finally groups into two main factors:the time factor(page access time/page size) and the frequency factor(page access frequency).We can get the weightings of the two factors through the principal component analysis method.Then we get a formula to calculation the user-item rating.Finally,using the worked log data and the formula,we can get the initial page interest degree matrix.Secondly,to solve the sparsity of the initial page interest degree matrix,the new algorithm applies the Singular Value Decomposition(SVD) to solve the sparsity of the matrix.We use SVD predict the vacancies of the page interest degree matrix,and then we can get a new matrix without vacancy.The reason of choosing to use SVD is SVD can mine the potential relationship of the factors in the case of data sparsity, while the matrix used SVD has a lower noise than the original matrix,it is easy to find the relationship between the various factors.Based on the page interest degree matrix without vacancy,we use Slope One algorithm to predict the page interest degree of the pages in the given testing set.Slope One algorithm is a new CF algorithm which is similar to Item-based CF algorithm.Slope One algorithm don't apply KNNs to select the neighbors,it calculates the mean difference between any two items,and use the mean difference to predict rating.The above-mentioned is the process of the new algorithm which we call SlopeOne_After_SVD algorithm.SlopeOne_After_SVD algorithm combines the merits of SVD and the merits of Slope One algorithm,it can solve the problems of data sparsity,new user and scalability.3.Apply SlopeOne_After_SVD algorithm with real data and analyze the tests' result.In this paper,the tests included two steps:(1) In SVD,it is important to select the value of the reserved dimension,known as "k".If the value of "k" is too small,SVD can't get the important structure of the page interest degree matrix;if the value of "k" is too large,SVD can't remove the noise effectively.Therefore,we need to compare the accuracy of prediction in different values of"k".(2) Compare the accuracy of prediction by SlopeOne_After_SVD algorithm and by Slope One algorithm.In this paper,we use 5-fold cross footing method(that is, select different testing set) to test both of the two algorithms.Finally,we get the conclusion:in the case of data sparsity,SlopeOne_After_SVD algorithm compared with Slope One has improved the accuracy of prediction on different training sets and testing sets.Above all,this paper proposed a method to calculate use-page rating based on web log and SlopOne_After_SVD algorithm.In the case of data sparsity,the proposal has a certain theoretical significance and application value.But the proposal still has some shortcomings,such as the formula of page interest degree is limited to our data, it is lack of universality,and SlopeOne_After_SVD don't solve the problem of cold-start etc.These problems still need to be carried on an improvement in aftertime's work.
Keywords/Search Tags:Page Interest Degree, Collaborative Filtering Algorithm, Singular Value Decomposition, Slope One algorithm
PDF Full Text Request
Related items