Font Size: a A A

Research On Node Similarity Measurement Method For Large Scale Dynamic Graph

Posted on:2019-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:R F DuanFull Text:PDF
GTID:2428330545454764Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a data structure commonly used in the computer science,graphs can effectively express the extensive relationship among objects.It is more complex in structure and semantics than linear tables and trees.It has more general expressive ability,such as road traffic,Web semantic analysis,social network analysis,geographic information network,and so on[1-4].More and more application scenes need to be processed by the structure of graph data.At the same time,the increasing scale of the graph data has brought many challenges and analysis.It is of great significance to carry out relevant research on it.In the process of large-scale dynamic graph evolution,node similarity measurement and segmentation are regarded as a basic research of graph relationship,which is researched by many scholars.Most of the traditional research focuses on the research results of static graph,similarity subgraph query and subgraph excavation on cumulative dynamic graph.This paper studies the node similarity in the process of large-scale dynamic graph evolution.Becase there are few studies on the similarity measurement and segmentation of large scale dynamic graph nodes,this paper proposes a method for classification of node similarity in large-scale dynamic graph.In view of the above problems,this paper proposes a method for classification of node similarity in large-scale dynamic graph,which includes data preprocessing,node similarity calculation and node similarity segmentation.In order to solve the problem of storage and processing of large-scale dynamic graph,this paper adopts the operator in the graphX library in the spark distributed computing framework,which encapsulates the basic calculation of the graph,and makes the algorithm run and implement more effectivly.First,in the data preprocessing stage,the edge set and the vertex set of the snapshot in the process of large-scale dynamic graph evolution are obtained.The set of edges and the set of vertices are converted into two nodes.csv and edges.csv files,and then the two nodes.csv and edges.csv files are read by the graphX operator.Secondly,the node similarity calculation can be divided into the similarity calculation of adjacent nodes and the similarity calculation of non adjacent nodes.Nodes.csv and edges.csv files are used as input files for calculating the similarity of nodes.According to the set of edges and the set of vertices,the similarity of two adjacent nodes can be calculated by graphX.The similarity degree of adjacent nodes is known,and the similarity degree of non adjacent nodes is calculated by the similarity of adjacent nodes.The algorithm is a recursive algorithm,and the similarity calculation of adjacent nodes and the similarity calculation of non adjacent nodes are finally realized.According to the clustering method of time series constraints,the similarity degree of nodes is clustered into different cluster segments,and different segments lead to different similarity degrees and intersegment similarity degrees.For different segmentation results,the goodvalue value is calculated according to the evaluation formula of segmented results,and the segmentation with the largest goodvalue value is selected as the optimal segmentation result.Finally,according to the laboratory of two data sets,it is proved that the algorithm has obvious advantages in storage and execution efficiency.Then,the optimal segmentation of different data sets is selected according to the goodvalue value produced by different data sets.
Keywords/Search Tags:Distributed, large scale graph, similarity, clustering
PDF Full Text Request
Related items