Font Size: a A A

Single-source SimRank Computating And Its Application In Collaborative Filtering

Posted on:2020-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:M Z ChenFull Text:PDF
GTID:2428330575958447Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a non-linear data structure that expresses the abstract relationship between ob-jects and objects,graphs have a more general expression ability and play an important role in our real life.In recent years,the scale of data is increasing with the rapid devel-opment of technologies such as the Internet,cloud computing,and Internet of Things."Big Data" has brought great opportunities and challenges to our life.At the same time the scale of graph data also grows rapidly with the development of recommendation system,information retrieval,and social network analysis.As the scale of the graph data increases,people find that huge graph data is of great value.Various Internet applications based on graph structure have emerged in recent years.In these applications,the most typical and fundamental task is to measure the similarity between objects.SimRank is a very popular model based on graph topology structure to measure the similarity between any two objects.The basic idea is that an object should be most similar to itself and two objects are similar if their neighbors are similar.However,computation of SimRank is often costly in both space and time due to the recursive dependency in the definition of SimRank,and the increase of the graph data also makes the problem more difficult.In this paper,we mainly focus on the computation of Single-source SimRank:given a query vertex,return the similarity between the vertex and any other vertices.We proposed and implemented a highly parallel algorithm called Probe Walk.Our ap-proach is improved and optimized based on the random walk model,consisting of an offline indexing process and an online query process.The processing of the index stage takes linear time and space,while the query phase of a single source vertex takes only constant time and space.We implement the algorithm to the popular distributed pro-cessing platform Spark.The results show that our algorithm has high accuracy and high efficiency.The similarity measurement is an important part of collaborative filtering.Be-cause the traditional similarity measurement is less effective in collaborative filtering and SimRank cannot be directly applied to a weighted bipartite graph,this paper intro-duces SimRank++ into collaborative filtering,and proposes a novel random walk based on a two-phase Monte Carlo simulation,which can be applied in large scale collabora-tive filtering computation.Experiments show that the prediction error of SimRank++on Movielens dataset is lower than that of Pearson correlation coefficient and Cosine similarity.
Keywords/Search Tags:SimRank, Random Walk, Spark, Distributed Algorithm, Collaborative Filtering
PDF Full Text Request
Related items