Efficient data management and keyword-based association discovery on graph data of large scale

Posted on:2015-05-16

Degree:Ph.D

Type:Dissertation

University:Indiana University

Candidate:Zhou, Mo

Full Text:PDF

GTID:1478390020452363

Subject:Computer Science

Abstract/Summary:

Graph has been widely used in modeling problems in many domains as Bioinformatics, Cheminformatics and the Semantic Web. We target at how to efficiently store and query graph data and how to express and efficiently answer complex search queries.;The existing graph storage and query evaluation techniques mostly store graph data in relational tables and transform graph queries into SQL queries. The mismatch of the rigid relational model and the flexible graph model prevents these techniques from preserving the semantics of graph data, having high storage efficiency and high query efficiency at the same time. We propose to take advantage of the mature storage and query evaluation techniques in the context of semi-structured data and propose to decompose graph data into XML trees to be stored in XML repository. The graph query is transformed into XML queries and evaluated in XML repository. Our experimental results show that the RDF-to-XML decomposition can meet all three criteria. We studied search applications in Bioinformatics, Health informatics and Social Networks. We observed that finding paths satisfying constraints in a graph is critical to these search scenarios. We abstract such search requests and formally define the problem of constraint acyclic path (CAP) discovery. We study how to express CAP queries and propose a new graph query language, constraint SPARQL (cSPARQL), to fulfill the need in expressing CAP search queries, as well as more complex pattern matching search queries cooperating with CAP discovery. We propose efficient algorithms to answer CAP discovery problem: constraint DFS algorithms (cDFS and ecDFS) are based on DFS graph traversal with efficient pruning on search branches; localized Search & Join (S&J) uses the local information to limit the search ranges and perform more effective pruning. We implement the algorithms in a prototype system-Conkar that can be applied to multiple domains, e.g. drug discovery.

Keywords/Search Tags:

Graph, Discovery, CAP, Search, Efficient, XML

Related items

1	Efficient Approximate Nearest Neighbor Search Based On Navigating Spreading-out Graph
2	Design And Implementation Of Service Discovery Function Based On Interface Matching In Web Service Search Engine
3	Efficient information discovery and retrieval in wireless ad hoc networks
4	The Relevant Technologies Research On Deep Web Source Discovery
5	Design of a structural search engine using a graph-based knowledge discovery system
6	Research On Graph Similarity Search Based On Frequent Sub-patterns
7	Efficient external-memory graph search for model checking
8	Memory-efficient graph search applied to multiple sequence alignment
9	The Design And Implementation Of Entity Information Discovery Methods Based On Cooperative Search
10	Study And Implementation Of Attribute Discovery Oriented Collaborative And Iterative Search System