Font Size: a A A

Similarity Top-k Query For Large-scale Dynamic Graph

Posted on:2019-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LiFull Text:PDF
GTID:2428330545954777Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The graph data has property of describing the various objects and their relationships in the real world,so it has been widely used in many fields,such as social network processing,intelligent transportation network and biological molecules.The similar nodes query of some similar nodes has always been a hotspot problem in the field of graphs.The traditional graph node similarity calculation method is usually more efficient on small scale or static graph.It's not only to calculate the similarity between graph nodes,but also to rank and get the Top-k results in order to satisfy the demand.With the popularity of the Internet and the development of big data,the number of graph nodes is increasing with each passing day,and the scale of graph nodes is growing bigger.The data of graph notes update more and more frequently,so it's unable to adapt to large-scale dynamic graph and the processing efficiency is very low.On account of the problems,the paper improves the classical SimRank algorithm in the graph node similar processing area and lucubrates the similar node Top-k query algorithm in large-scale dynamic graphs.Firstly,the thesis differentiates the similar relationship according to the distance of the similarity and defines three similar relationships: direct-similarity,minor-similarity and generic-similarity.Secondly,according to the relationship of n(the number of nodes satisfied the direct-similarity with query node)and k,the corresponding algorithm is used to search for similar nodes respectively.Of which,when the value of k-n is bigger than a certain number,because it may be necessary to evaluate the candidate set that is more than 3-stepsaway from query node,so the thesis select the single-direction graph to improve the query efficiency.When the graph data is dynamic,use the trigger-update method,the thesis determines the degree of correlation between the update and the query.The method of updating timely will be utilized on the condition of it intimate degree of correlation,or the method of updating at regular time will be utilized.The thesis queries in two aspects:similar node query for large-scale graphs and for large-scale dynamic graphs.Themain content of this thesis is as follows:(1)Improving SimRank algorithm.To avoid that the sum of “long distance relationship” is bigger than that of “short distance relationship” due to the superposition in original SimRank algorithm,and make it more practical,the thesis defines three similar relationships: direct-similarity,minor-similarity and generic-similarity according to the distance of the similarity relationships,also presents similarity calculation methods corresponding to three similar relationships.(2)When there is generic-similarity relationship between the nodes and the query node,the thesis uses the method of extracting single-direction graph to improve the query efficiency and reduce data redundancy.(3)Solving the update and maintain problems for node similarity Top-k query in large-scale graphs.The thesis uses the snapshot method to record the changes in the graph over a period of time.The trigger update policy is utilized,that is to update timely when the updated graph is relevant to the query;on the contrary,updating at regular time will be utilized,and after merging the updates,the updated graph is unified.(4)Taking experiments on simulated data sets and real data sets which prove that the algorithm achieved the validity and feasibility of similar node Top-k query in large-scale dynamic graphs with good performance.
Keywords/Search Tags:SimRank, similar node Top-k query, large-scale graph, dynamic graph
PDF Full Text Request
Related items