Font Size: a A A

Research On Collaborative Filtering Algorithm Integrating User Clustering And Improved Similarity

Posted on:2022-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZengFull Text:PDF
GTID:2518306773481184Subject:FINANCE
Abstract/Summary:PDF Full Text Request
with the rapid development of Internet technology and the explosive growth of network information resources,mankind has entered the era of big data.Nowadays,the increasingly serious information overload problem has made it difficult for users to accurately obtain the personalized resources they need.Personalized recommendation technology has been widely used because it can filter the massive information accurately and help users get personalized resources quickly.In fields such as e-commerce,movies and short videos,personalized recommendation technology has become an essential and effective way to solve information overload.The large-scale application of personalized recommendation technology in the business field has promoted the development of the business field,and at the same time,the development of the business field has put forward higher requirements for the recommendation technology.At present,personalized recommendation technology using collaborative filtering algorithm is the most widely and successfully used to address the problem of information overload.However,the traditional collaborative filtering algorithm takes a single factor into consideration when calculating the user similarity,and uses the user rating similarity alone as the measurement criterion,which makes the calculation of user similarity defective,and then causes the accuracy of the recommendation algorithm to meet the needs of users.At the same time,with the rapid increase in the size of web data,the recommendation algorithm takes too long to run and degrades the performance when analyzing a large amount of data,i.e.,the problem of insufficient scalability in the recommendation system.In order to alleviate the two problems mentioned above that limit the widespread diffusion of recommendation systems,this paper makes improvements in two aspects as follows.(1)To address the problem of insufficient recommendation accuracy due to popularity bias in collaborative filtering algorithm recommendation,this paper proposes an improved collaborative filtering algorithm based on similarity calculation.Firstly,the original similarity calculation is improved by the user rating time periodicity feature;secondly,considering the popularity bias,a penalty factor is added to the popular items in the Pearson similarity calculation;finally,a new similarity calculation formula is formed by fusing the user rating periodicity feature and the penalty factor to make the calculation results more accurate,thus improving the recommendation accuracy of the collaborative filtering algorithm.(2)To address the problems of poor scalability and performance of collaborative filtering algorithm,this paper proposes a K-means collaborative filtering recommendation algorithm that optimizes the initial clustering center.The closest two points in all samples are selected by calculating the distance between any two samples to form a set,and the shortest distance from the set is searched according to the formula of points and sets until the number of data in the set is greater than or equal to ?(? is the ratio of the total number of data points in the sample to the number of clusters in the cluster),then the set is removed from the sample set,and the above steps are repeated to obtain the set with the same number of clusters as the initial set.The mean value of the set is used as the initial center.A collaborative filtering algorithm is used inside the clusters of the obtained user clusters to improve the recommendation scalability and performance by reducing the search space of the target users and the time complexity of the algorithm.In order to verify the effectiveness of the algorithm,this paper uses the movie rating network data set provided by Netflix to carry out experiments and conduct comparative analysis.The experimental results show that the average reduction in MAE and RMSE of the improved similarity calculation method STS compared with the commonly used similarity calculation method is 3.91% and 2.81%,respectively,indicating that the accuracy of recommendation results of the STS algorithm is improved to a certain extent.Compared with the traditional collaborative filtering algorithm,the proposed K-STS algorithm improves the operation efficiency by55.63%,and the accuracy rate by 47.34% in F1 index.Compared with the collaborative filtering based on K-means algorithm,the efficiency is slightly reduced,but the accuracy rate is increased by 48.78%.It is proved that the improved K-STS algorithm can effectively improve the accuracy of recommendation and alleviate the problem of poor system scalability to a certain extent.
Keywords/Search Tags:Collaborative filtering, Scalability, K-means clustering, Popularity bias, Similarity calculation
PDF Full Text Request
Related items