Font Size: a A A

The Computations Of PageRank Scores Based On GPGPU Environment

Posted on:2018-12-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Full Text:PDF
GTID:1318330536976260Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the volume of data being processed increases rapidly,how to measure the importance of the data is a big challenge,and becomes crtical research issue.PageRank algorithm is considered as a key tool to address this issue.PageRank score as a key parameter to evaluate the importance of data has been widely applied in many areas,such as search engine,bio-pharmaceuticals,and recommendation systems.We should solve the PageRank scores for all of the data in these applications.On the other hand,PageRank scores are considered for only several data in some conditions,including measuring the “semantic relatedness”,detecting spam emails,and paper retrieving.In this paper,we denote the problem related to the PageRank values of entire vertices as PageRank problem and the problem related to the PageRank values of only several vertices as local PageRank problem,respectively.How to efficiently to solve these kind of problems is of great practical significance.We look for the approaches to improve the performance of problems based on GPGPU environment.Due to the many cores architecture,GPU provides large stronger computing power of floating point than CPU.GPU has been widely used in high performance computing areas in recent years.Firstly,we propose a computing framework PCSR to improve the performance of the PageRank problem computations.PCSR classifies the graph into some vertices subsets to parallel process according to the out degree and in degree.We use static and dynamic parallelism strategies to overcome the bottleneck of single vertex computing in subsets containing middle degree and high degree vertices.In order to address the imbalanced workload of subsets contain low degree vertices,we merge the workloads of vertices into the workload of a single thread.Furthermore,we merge the computations of dangling vertices as an independent procession to accelerate computing.The results of experiments show that our implementations of PageRank algorithm are much more efficient than previous work under GPGPU environment.On the other hand,we characterize the local PageRank problem as solving one component of linear systems,and then we introduce Markov Chain Monte Carlo(MCMC)method to deal with numerical computing problem.Moreover,we investigate the properties of MCMC method for local PageRank computations.According to the properties of algorithm and GPGPU architecture,we adopt a serials of optimizations to enhance the performance of local PageRank computing,including variance-aware strategy,reordering strategy,Share Memory based strategy and uniform transition matrix strategy.We propose a variance-aware strategy based the property analysis and case analysis to overcome the imbalanced workload among vertices.Variance-aware strategy can efficiently adjust the computing resource for a vertex based the rate of change for variance of random walks.We then implement the strategy with a block level optimization,so each block can individually execute the strategy without relying on extra information from other blocks.In addition,we come up with reorder strategy not only to compress the storage space of dangling vertexes,also speed up the computations by reducing the computing complexity of Markov chain procession.Moreover,the determined low-discrepancy sequences are organized to load in the on-chip Shared Memory to accelerate fetching with a bank-wise schema in Shared Memory based strategy.In some environment,the structure of graph evolves frequently;a uniform transition matrix is proposed to address this challenge.We conduct various kinds of experiment to verify our strategies,and the results of experiments demonstrate that our strategies can significantly improve the performance of local PageRank computations.
Keywords/Search Tags:PageRank, Local PageRank, Monte Carlo method, GPGPU, Parallel computing
PDF Full Text Request
Related items