Font Size: a A A

Research And Implementation Of Graph Mining Platform Based On Pregel-like Framework

Posted on:2017-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330518995965Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graphs,as an important type of unstructured data,have stronger representation in the aspect of semantics and structure compared to link tables and tree structures.Many real-world issues can be described using a graphical structure,and the processing and applications of graph data is required in almost all cases.But with the explosive growth of information and the development of social network,resulting in the scale of graph-based data has increased significantly.How to mine and analyze big data better has attracted much attention.Cloud computing is one of the effective methods to solve massive data mining tasks and improve the ability of mining mass data.Therefore,we propose a system based on Pregel-Like framework,which is used in graph mining.We choose HDFS to support for the storage of large scale graphs,which is an efficient,safe and high fault-tolerant file system.And we choose BSP model as the cloud computing framework to compute graph algorithms with large scale,whose unique super-step mechanism is suitable for iterative computations.Our platform integrates a variety of graph-mining algorithms and provides the visual display of results of algorithms and topology of graphs.This paper focuses on researching on the classical graph-mining algorithms and some improved algorithms.We design and implement parallel algorithms to compute attributes of graphs and to sort vertexes in graphs based on the underlying cloud framework,and compare the results of improved algorithms with classical algorithms combined with actual datasets.Finally,we propose a new community detection method based on K-shell.The contents of them are as follows.Firstly,we design and implement parallel K-shell decomposition and computation of semi-local center,both of which are attributes of a graph.The two attributes have better result on finding important vertexes of a graph compared to degree and have less computation complexity than betweenness.We also analyze the related distribution between the two attributes and degree of actual datasets.Secondly,we design and implement parallel LeaderRank and SALSA,both of which are graph sorting algorithms.Compared against the PageRank algorithm,the LeaderRank algorithm converges faster which has better ability to identify influential vertexes and has strong anti-jamming capability.And the SALSA algorithm has better ability to measure the spread capacity of vertexes than the HITS algorithm.Thirdly,we propose a community detection method based on the K-shell decomposition by which the graph scale is reduced and the efficiency is improved.And we verify the results with benchmark graphs.
Keywords/Search Tags:large-scale graph, BSP, graph attributes, graph ranking, graph clustering
PDF Full Text Request
Related items