Single-source SimRank Computating And Its Application In Collaborative Filtering

Posted on:2020-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:M Z Chen

Full Text:PDF

GTID:2428330575958447

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As a non-linear data structure that expresses the abstract relationship between ob-jects and objects,graphs have a more general expression ability and play an important role in our real life.In recent years,the scale of data is increasing with the rapid devel-opment of technologies such as the Internet,cloud computing,and Internet of Things."Big Data" has brought great opportunities and challenges to our life.At the same time the scale of graph data also grows rapidly with the development of recommendation system,information retrieval,and social network analysis.As the scale of the graph data increases,people find that huge graph data is of great value.Various Internet applications based on graph structure have emerged in recent years.In these applications,the most typical and fundamental task is to measure the similarity between objects.SimRank is a very popular model based on graph topology structure to measure the similarity between any two objects.The basic idea is that an object should be most similar to itself and two objects are similar if their neighbors are similar.However,computation of SimRank is often costly in both space and time due to the recursive dependency in the definition of SimRank,and the increase of the graph data also makes the problem more difficult.In this paper,we mainly focus on the computation of Single-source SimRank:given a query vertex,return the similarity between the vertex and any other vertices.We proposed and implemented a highly parallel algorithm called Probe Walk.Our ap-proach is improved and optimized based on the random walk model,consisting of an offline indexing process and an online query process.The processing of the index stage takes linear time and space,while the query phase of a single source vertex takes only constant time and space.We implement the algorithm to the popular distributed pro-cessing platform Spark.The results show that our algorithm has high accuracy and high efficiency.The similarity measurement is an important part of collaborative filtering.Be-cause the traditional similarity measurement is less effective in collaborative filtering and SimRank cannot be directly applied to a weighted bipartite graph,this paper intro-duces SimRank++ into collaborative filtering,and proposes a novel random walk based on a two-phase Monte Carlo simulation,which can be applied in large scale collabora-tive filtering computation.Experiments show that the prediction error of SimRank++on Movielens dataset is lower than that of Pearson correlation coefficient and Cosine similarity.

Keywords/Search Tags:

SimRank, Random Walk, Spark, Distributed Algorithm, Collaborative Filtering

PDF Full Text Request

Related items

1	SimRank Computation On Large Graphs Based On Spark
2	Studies On Collaborative Filtering Recommendation Based On Trust Network And Random Walk
3	A Two-stage Collaborative Filtering Algorithm Based On Random Walking And Cluster-based Smoothing
4	Research And Application Of Random Walk Based Recommendation Techniques
5	A Study On Spark-based Distributed Collaborative Filtering And Its Tools
6	Application And Research Of Random Walk Model In Personalized Recommendation Technology
7	Research On Context-Aware Information Collaborative Filtering Recommendation Algorithm Based On Spark
8	Improvement And Implementation Of Collaborative Filtering Algorithm Based On Spark
9	The Research And Application Of Collaborative Filtering Algorithm Based On Spark
10	Research On Collaborative Filtering Recommendation Algorithm Based On Spark And System Implementation