Font Size: a A A

Studies On Reachability Query Technologies For Large Graph Data

Posted on:2015-11-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:1318330482955739Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Reachability query is one of the most fundamental issues in graph data mining and management. It is widely applied to many related areas, such as social networks, bio-information networks, traffic networks, semantic web and so on, and has been researched for decades. From the early stage online search to the present application of index methods, many valuable achievements have been made. Given a source node and a destination node, the target of reachability query is to find the two vertices'reachability state. Details include determining if there are paths between them, obtaining the shortest distance between them and querying if the arrangement of the path label set between them follows a certain rule. Because this kind of operation is the basis of other applications, its efficiency would affect their performance. Although there are many excellent methods in this field, they confront the problem of scalability along with the rapid expansion of the scale of data, especially with the advent of the big data age. Hence, in order to further improve the efficiency of reachability query, more researches are needed in this field.This dissertation analyses the feature of graph data models in the real world and discovers the useful law, and on this basis, classifies current research results into three groups, i.e., unconstraint reachability query, distance constraint query and regular expression-based query. The main research harvests are as follows:(1) Utilizing the reachability trunk to answer unconstraint reachability queryMost traditional methods construct their indices directly on original graphs. This made these approaches hard to be expended, and only can handle smaller scale graph data with tens of thousands vertices. For this reason, this dissertation puts forward a new reachability trunk framework. The main source of the theory is the phenomenon that most of real world's data graphs are generally sparse in global yet dense in local. They have vertices in a group structure that the vertices within the groups have higher density of edges while vertices among groups have lower density of edges. Utilizing this insight, we first partition the original graph into different small groups. Then we pick up a central vertex that has reachability relationship with other vertices in the group from each group. Take advantage of these central vertices, we can create a subgraph, named reachability trunk, that reserves all the topological information of the original graph. This subgraph is not the final index. However, its size is much smaller than the original graph and the structure is very sparse. Consequently, we can use those traditional methods to continue the index construction process by the trunk. The experimental results indicate, compare to similar efficient approaches, our method not only can ensure the query efficiency but also is superior to other algorithms in construction time and index size.(2) Utilizing the shortest path trunk and multilevel trunk to answer distance constraint queryThe fundamentals of the reachability trunk and the shortest path trunk are the same. The only difference is that each edge of the shortest path trunk has a weight, i.e., the shortest distance between the trunk vertices. Like the reachability trunk, the shortest path trunk also can reduce the original graph's size and form a small-scale subgraph with sparse structure, which is very good for other approaches'index constructing. The experiment result shows that this structure can improve the process capacity of other methods. However, although the reachability/shortest path trunk can compress the size of the original graph, it still needs other algorithms to assist to continue the index construction. If the scale of graph data keeps on increasing, this approach might also face the same problem of scalability. This dissertation proposes another index strategy, i.e., the multilevel trunk-labeling scheme. The main idea of this method is that instead of using other algorithms to construct the index on the trunk, we recursively utilize our framework to construct subgraphs for each level of trunk. During the operation, we compute labels for each vertex. Finally, reachability and distance queries can be answered by these labels efficiently. Experimental results show that this method can handle the distance query on undirected complex networks that generally can be dealt with other methods inefficiently.(3) Utilizing index to answer regular expression-based queryWe make a concrete analysis and study of edge-labeled graph data, and present two index structures for the regular expression-based reachability query. One is the generalized-list-based compact BFS adjacent list. This method first constructs a BFS adjacent list for the original graph. Then we cluster the adjacent vertices with the same label as far as possible into a virtual node. In the end, a compact adjacent list in the form of generalized list is created. This index structure can retain all the path information of the original graph and can answer queries efficiently. The second method is to solve one of the flaws of the first method. Because the first approach needs to use all the vertices to construct the index, the index space cost is too big. By using the principal of vertex set cover, we first get an approximate minimum vertex set cover by a 2-approximation algorithm. Based on it, a compact index structure is constructed to preserve the edge label information. Because it uses only part of vertices of the original graph, the index size is compressed effectively. Experimental results show that the query efficiency of the two methods is much better than other common methods.
Keywords/Search Tags:unconstraint reachability query, distance constraint query, regular expression-based reachability query, reachability trunk, multilevel community center
PDF Full Text Request
Related items