Font Size: a A A

Research On High-dimensional Sparse Data In Collaborative Filtering Algorithm

Posted on:2020-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:H M LuFull Text:PDF
GTID:2518306104496134Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The recommendation system is the product of addressing individual needs.As the most widely used algorithm in recommendation algorithms—collaborative filtering,has great research value.High-dimensional sparse data in collaborative filtering algorithms leads to deviations in similarity calculation and score prediction,and the efficiency of selecting neighbors is low,which seriously affects the quality of the algorithm.At present,the research on this problem has the following shortcomings: In the study of improving the similarity calculation,the difference of the score is not considered,the degree of user's like and dislike of the item attributes is not measured,the implicit interest is not mined;Ignoring the high-dimensional characteristics of the data,and choosing the nearest neighbor in the entire data set,resulting in algorithm has longer running time;In the research of improving the prediction of the score,the difference in similarity between users on different items is not considered.This paper starts with the understanding of the principles of collaborative filtering algorithms and the research of related theories.It will be carried out from the following three aspects:(1)Not only relying on the scoring matrix,but adding additional information to enrich the similarity calculation.Introducing information entropy to measure the amount of information included in the score difference,and combining the score differences to get the similarity of the score differences;Using fuzzy sets to obfuscate a single score,measuring the user's like and dislike in item attributes to obtain explicit interest similarity;Adding matrix factorization to mine users' implicit interest to obtain implicit interest similarity.The above three similarities are combined with the original modified cosine similarity to obtain a comprehensive similarity and alleviate the problem of sparse data;(2)The user is clustered by the improved K-Means algorithm that optimizes the initial centroid selection,and the nearest neighbors are selected in the target cluster to improve the algorithm's operating efficiency and alleviate the problem of high-dimensional data;(3)Considering the difference in similarity between users on different projects,it proposes that the trust degree based on a specific project is fused with the comprehensive similarity to obtain the similarity based on a specific project for score prediction and alleviate the problem of sparse data.The algorithm in this paper runs on the classic MovieLens data set.Through comparison and analysis with other groups of algorithms,it is found that compared with similar algorithms,the average absolute error of this algorithm is lower,which improves the recommendation accuracy.The improved K-Means algorithm has less running time,which improves the running efficiency.The algorithm in this paper alleviates the problem of high-dimensional sparse data to a certain extent and improves the quality of the algorithm.Finally,the algorithm in this paper is applied to movie recommendation to verify that the algorithm in this paper is effective and feasible in practice.
Keywords/Search Tags:Collaborative filtering, Similarity, Scoring difference, Interest Believability
PDF Full Text Request
Related items