Research And Implementation Of Graph Mining Platform Based On Pregel-like Framework

Posted on:2017-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2348330518995965

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Graphs,as an important type of unstructured data,have stronger representation in the aspect of semantics and structure compared to link tables and tree structures.Many real-world issues can be described using a graphical structure,and the processing and applications of graph data is required in almost all cases.But with the explosive growth of information and the development of social network,resulting in the scale of graph-based data has increased significantly.How to mine and analyze big data better has attracted much attention.Cloud computing is one of the effective methods to solve massive data mining tasks and improve the ability of mining mass data.Therefore,we propose a system based on Pregel-Like framework,which is used in graph mining.We choose HDFS to support for the storage of large scale graphs,which is an efficient,safe and high fault-tolerant file system.And we choose BSP model as the cloud computing framework to compute graph algorithms with large scale,whose unique super-step mechanism is suitable for iterative computations.Our platform integrates a variety of graph-mining algorithms and provides the visual display of results of algorithms and topology of graphs.This paper focuses on researching on the classical graph-mining algorithms and some improved algorithms.We design and implement parallel algorithms to compute attributes of graphs and to sort vertexes in graphs based on the underlying cloud framework,and compare the results of improved algorithms with classical algorithms combined with actual datasets.Finally,we propose a new community detection method based on K-shell.The contents of them are as follows.Firstly,we design and implement parallel K-shell decomposition and computation of semi-local center,both of which are attributes of a graph.The two attributes have better result on finding important vertexes of a graph compared to degree and have less computation complexity than betweenness.We also analyze the related distribution between the two attributes and degree of actual datasets.Secondly,we design and implement parallel LeaderRank and SALSA,both of which are graph sorting algorithms.Compared against the PageRank algorithm,the LeaderRank algorithm converges faster which has better ability to identify influential vertexes and has strong anti-jamming capability.And the SALSA algorithm has better ability to measure the spread capacity of vertexes than the HITS algorithm.Thirdly,we propose a community detection method based on the K-shell decomposition by which the graph scale is reduced and the efficiency is improved.And we verify the results with benchmark graphs.

Keywords/Search Tags:

large-scale graph, BSP, graph attributes, graph ranking, graph clustering

PDF Full Text Request

Related items

1	Research On Large Graph Aggregation Algorithm Based On Finite Memory
2	A Research On Graph Computation Platform In A Single PC By Application Of Graph Algorithm Tuned Graph Representation Format
3	Research On Fast Graph Clustering Algorithm On Large-Scale Data
4	Research On The Large-scale Evolution Graph Data Partition Method Based On Vertex-Cut
5	Research On Key Techniques Of Large Scale Graph Mining Based On GPU Hybrid Architecture
6	Research On Key Technologies Of Large-Scale Graph Partition Stratigies
7	Large-Scale Graph Visual Analytic
8	Research And Implementation Of Virtual Machine Management System For Large Scale Graph Processing
9	Study On Graph Sampling Algorithm For Graph Clustering Characteristic
10	Storage And Analysis Systems For Large-Scale Time-Evolving Graphs