Font Size: a A A

Efficient data management and keyword-based association discovery on graph data of large scale

Posted on:2015-05-16Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:Zhou, MoFull Text:PDF
GTID:1478390020452363Subject:Computer Science
Abstract/Summary:
Graph has been widely used in modeling problems in many domains as Bioinformatics, Cheminformatics and the Semantic Web. We target at how to efficiently store and query graph data and how to express and efficiently answer complex search queries.;The existing graph storage and query evaluation techniques mostly store graph data in relational tables and transform graph queries into SQL queries. The mismatch of the rigid relational model and the flexible graph model prevents these techniques from preserving the semantics of graph data, having high storage efficiency and high query efficiency at the same time. We propose to take advantage of the mature storage and query evaluation techniques in the context of semi-structured data and propose to decompose graph data into XML trees to be stored in XML repository. The graph query is transformed into XML queries and evaluated in XML repository. Our experimental results show that the RDF-to-XML decomposition can meet all three criteria. We studied search applications in Bioinformatics, Health informatics and Social Networks. We observed that finding paths satisfying constraints in a graph is critical to these search scenarios. We abstract such search requests and formally define the problem of constraint acyclic path (CAP) discovery. We study how to express CAP queries and propose a new graph query language, constraint SPARQL (cSPARQL), to fulfill the need in expressing CAP search queries, as well as more complex pattern matching search queries cooperating with CAP discovery. We propose efficient algorithms to answer CAP discovery problem: constraint DFS algorithms (cDFS and ecDFS) are based on DFS graph traversal with efficient pruning on search branches; localized Search & Join (S&J) uses the local information to limit the search ranges and perform more effective pruning. We implement the algorithms in a prototype system-Conkar that can be applied to multiple domains, e.g. drug discovery.
Keywords/Search Tags:Graph, Discovery, CAP, Search, Efficient, XML
Related items