Font Size: a A A

Hierarchical and semantic data management and querying for patient records and personal photos

Posted on:2009-05-28Degree:Ph.DType:Thesis
University:Case Western Reserve UniversityCandidate:Elliott, Brendan DavidFull Text:PDF
GTID:2448390002492048Subject:Computer Science
Abstract/Summary:
The demands of modern data management have recently stretched traditional relational database systems with increased focus on flexible, semi-structured data formats. These formats, such as XML, enable elegant representation of complex hierarchical structures. In this thesis, we go beyond XML to focus on the challenges posed by a new generation of semi-structured data management problems, including genealogical data (pedigrees) used to track inherited diseases, semantically-annotated personal photos, and more complex RDF data, such as longitudinal patient medical records used for clinical research. Specifically, we need new languages for querying, new models for storage, new types of indexes for efficient processing, and new techniques for query optimization. In addition, to bring about the dream of semantic data management, we need new tools for extending existing data with semantic annotations to enable effective querying. In the domain of pedigree data, we propose a new pedigree query language (PQL), along with efficient processing methods based on efficient indexing schemes. Our system facilitates significant performance improvements in processing queries for large pedigrees (5--77 times faster). For personal photos, we develop a prototype photo management system that hosts over 80,000 photos. We propose new metadata-based annotation suggestion methods that offer significantly improved recall (4--19% higher) and faster suggestion time (14--21 times faster). We also present techniques that use family and social networks for photo management that are especially effective for bootstrapping new collections with a limited amount of initial metadata. Finally, to facilitate semantic querying over 200,000+ patient records stored as RDF and support medical research, we propose an efficient method for evaluating semantic queries in the SPARQL query language. Our approach uses relational databases and is capable of processing queries significantly faster than previous work (up to 280--1900 times). We also propose an RDF query-optimization technique that uses carefully designed statistics and estimates of the contents of the RDF data, and that can further improve performance. Again, the performance improvement is significant (up to 204--4700 times faster) while adding only modestly to the space requirements.
Keywords/Search Tags:Data, Semantic, Times faster, Querying, Personal, Patient, Records, Photos
Related items