Font Size: a A A

Research Of Monte Carlo Based Incremental PageRank Algorithms

Posted on:2020-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZhanFull Text:PDF
GTID:2428330590977063Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Network is a powerful and versatile way to represent inter-object connections and contact patterns.Many systems in life can be expressed abstractly using network data.With the rapid development of the Internet and social networks,researches on networks have become more and more important.Among them,centrality has received a lot of attention,that is,which nodes are the most important or core in a network.In 1998,Google proposed a more rational centrality measure,the PageRank algorithm,and applied it to Google Search.Not only has Google Search been popular,but also the PageRank algorithm is widely known and widely used in sociology,physics and computer science.However in practice,many networks are very large and are constantly changing.In this case,PageRank algorithms that work only for static networks are insufficient,especially when it is desirable to track PageRank values in real-time rather than to wait for a batched computation.Therefore,we need a more efficient incremental PageRank tracking algorithm.Currently,there are two types of PageRank incremental algorithms: aggregation partitioning algorithms and Monte Carlo based algorithms.Theoretical studies have shown that cumulative error is unavoidable for aggregation partitioning algorithms.So we focus on the Monte Carlo type algorithms.Previous Monte Carlo type algorithms have neglected the case that random walks may revisit a same node,resulting in low precision.To solve this problem,we propose a more efficient Monte Carlo based algorithm for PageRank tracking on dynamic networks.A revisit probability model is also presented to provide theoretical support for our algorithm.For a graph with n nodes,the proposed algorithm maintains only nR random walk segments(R random walks starting from each node)in memory.The time cost to update PageRank scores for each graph modification is proportional to n/|E|(E is the edge set),and has nothing to do with the size of the network.So our algorithm is equally applicable to large-scale networks.Experiments on 5 real-world networks indicate that our algorithm is 1.3-30 times faster than state-of-the-art algorithms and does not accumulate any errors.
Keywords/Search Tags:PageRank, Monte Carlo, random walk, dynamic networks, incremental
PDF Full Text Request
Related items