Research On Node Similarity Measurement Method For Large Scale Dynamic Graph

Posted on:2019-01-16

Degree:Master

Type:Thesis

Country:China

Candidate:R F Duan

Full Text:PDF

GTID:2428330545454764

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As a data structure commonly used in the computer science,graphs can effectively express the extensive relationship among objects.It is more complex in structure and semantics than linear tables and trees.It has more general expressive ability,such as road traffic,Web semantic analysis,social network analysis,geographic information network,and so on^[1-4].More and more application scenes need to be processed by the structure of graph data.At the same time,the increasing scale of the graph data has brought many challenges and analysis.It is of great significance to carry out relevant research on it.In the process of large-scale dynamic graph evolution,node similarity measurement and segmentation are regarded as a basic research of graph relationship,which is researched by many scholars.Most of the traditional research focuses on the research results of static graph,similarity subgraph query and subgraph excavation on cumulative dynamic graph.This paper studies the node similarity in the process of large-scale dynamic graph evolution.Becase there are few studies on the similarity measurement and segmentation of large scale dynamic graph nodes,this paper proposes a method for classification of node similarity in large-scale dynamic graph.In view of the above problems,this paper proposes a method for classification of node similarity in large-scale dynamic graph,which includes data preprocessing,node similarity calculation and node similarity segmentation.In order to solve the problem of storage and processing of large-scale dynamic graph,this paper adopts the operator in the graphX library in the spark distributed computing framework,which encapsulates the basic calculation of the graph,and makes the algorithm run and implement more effectivly.First,in the data preprocessing stage,the edge set and the vertex set of the snapshot in the process of large-scale dynamic graph evolution are obtained.The set of edges and the set of vertices are converted into two nodes.csv and edges.csv files,and then the two nodes.csv and edges.csv files are read by the graphX operator.Secondly,the node similarity calculation can be divided into the similarity calculation of adjacent nodes and the similarity calculation of non adjacent nodes.Nodes.csv and edges.csv files are used as input files for calculating the similarity of nodes.According to the set of edges and the set of vertices,the similarity of two adjacent nodes can be calculated by graphX.The similarity degree of adjacent nodes is known,and the similarity degree of non adjacent nodes is calculated by the similarity of adjacent nodes.The algorithm is a recursive algorithm,and the similarity calculation of adjacent nodes and the similarity calculation of non adjacent nodes are finally realized.According to the clustering method of time series constraints,the similarity degree of nodes is clustered into different cluster segments,and different segments lead to different similarity degrees and intersegment similarity degrees.For different segmentation results,the goodvalue value is calculated according to the evaluation formula of segmented results,and the segmentation with the largest goodvalue value is selected as the optimal segmentation result.Finally,according to the laboratory of two data sets,it is proved that the algorithm has obvious advantages in storage and execution efficiency.Then,the optimal segmentation of different data sets is selected according to the goodvalue value produced by different data sets.

Keywords/Search Tags:

Distributed, large scale graph, similarity, clustering

PDF Full Text Request

Related items

1	Research On Large-scale RDF Data Query Method Based On Graph Clustering
2	Research On Distributed Storage And Retrieval Technology Of Large-scale Knowledge Graph
3	Research And Application Of Clustering Algorithms For Large Scale Data Sets
4	Parallel algorithms for large-scale graph clustering on distributed memory architectures
5	Large-scale Distributed Graph Partitioning Algorithms
6	Research On Key Problems About Large-Scale Text Clustering
7	Research On Large Graph Aggregation Algorithm Based On Finite Memory
8	Research And Implementation Of Graph Mining Platform Based On Pregel-like Framework
9	Large Scale Dynamic Adaptive Graph Partition Algorithm
10	Research On Fast Graph Clustering Algorithm On Large-Scale Data