Font Size: a A A

Research Of Sparsity And Scalability Problem In Collaborative Filtering

Posted on:2016-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiFull Text:PDF
GTID:2308330479484679Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As the changing from IT era to DT era of society, the information people faced every day is increasing fiercely, information overload has becoming a major obstacle to the development of industry. Especially in the area of electronic commerce, users have to spend more time to choose their favorite ones in the wide variety of goods. Development of recommendation system makes up for this deficiency effectively, especially the collaborative filtering algorithm has got a great success. But as the number of goods becoming bigger, user-item rating matrix is getting sparsely, which brings a deep influence to traditional collaborative filtering algorithm accuracy. In addition, because of computer capability, there are computational efficiency and scalability in traditional collaborative filtering algorithm facing big data.This paper studies data sparsity and poor scalable when facing big data of collaborative filtering algorithm, and fills user-item rating matrix to decrease data sparsity and uses a distributed algorithm to improve scalability of the algorithm.First, too little scores will bring data sparsity of user-item rating matrix, reduces accuracy of collaborative filter algorithm. This chapter proposes assist-factor similarity from the overall distribution of rating vector of item, and binds assist-factor and traditional similarity calculation method together, proposes collaborative filter algorithm based on assist-factor. In the projects of too little common rating scores, we improved accuracy problems of insufficient recommended. Experiments show that, the algorithm can effectively ease data sparsity and improve recommended accuracy.Second, in order to deal with poor scalability facing big data in collaborative filter algorithm, the chapter implements one distributed implementation of collaborative filter recommendation algorithm based on Hadoop. Multiplying the user’s preference vectors and co-occurrence matrices to get recommended items, Dynamically increasing the cluster nodes to improve scalability. In the multiplication, the algorithm selects one improved partial product method instead of traditional matrix multiplication, reduces a large number of invalid calculations because of null in the matrix and improves computing resource utilization. At last, the experiments show that this algorithm can effectively improve the computational efficiency and has good scalability when facing big data.
Keywords/Search Tags:Collaborative Filtering Recommendation Algorithm, data sparsity, scalability, distribute, Hadoop
PDF Full Text Request
Related items