Font Size: a A A

Pattern Match On The Large Graph Databases

Posted on:2011-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:C X JieFull Text:PDF
GTID:2178360305992568Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The data management problem nearly exists in all the fields of human activity. In the information age, it is very important for all the people to manage large amount of data resource by computer. Relational database has gained great success in many fields, but nowadays, the complexity of many kinds of data has been beyond the processing capacity of relational database. Graph data is one of such complex data types. It has very strong expressing ability and can express the semantics of many real things very well, such as social networks, XML documents, biochemistry molecules, protein-protein networks, and so on. So, query and management of graph data has attracted many researchers' attention.The problem of pattern matching in graph database requires that, given a query graph, all the appearance locations in the database should be found. It is one of the key problems in graph database and also one of the most useful problems. Some researchers and developers has provides several solutions to this problem. But, most of them either focus on the "small graph" databases, that is, supposing that some graphs in database can be completely loaded into main memory and the whole matching process can be also executed in memory; or mainly focus on the basic query in large graph database, for the pattern matching, they usually use join operation in relational database to implement it. This paper mainly focuses on the problem of how to execute pattern matching query efficiently in a large, disk-based graph database. The main contributions of this paper are listed below:(1) It provides a large graph database design and implements important parts in it.(2) It provides a novel disk-based pattern matching algorithm. In this paper, the problem of pattern matching is converted into the problem of joining many temporary tables. Pointing to the defects of "breadth first" and "depth first" strategy, this paper suggests a "full forward-empty-backward" strategy. This strategy can complete the matching procedure without writing out any intermediate results to the disk. Some disk-based graph specified problems, such as, the methods of storing and accessing large amount of temporary data, cache strategy, etc, are also deeply discussed in this paper.(3) Some optimization methods are provided for our algorithm framework. The effects of different joining processing orders to the query execution efficiency are discussed in this paper and a measuring method of the goodness of a joining processing order is provided. A method of selecting a better joining order is also provided in this paper. Parallel executing of our algorithm and some off-line optimization methods are discussed in this paper.The experiments on a real dataset show that our algorithm and optimization methods are very effective.
Keywords/Search Tags:graph database, pattern matching, query optimization
PDF Full Text Request
Related items