Efficient scientific data management over trees

Posted on:2008-03-10

Degree:Ph.D

Type:Dissertation

University:University of Pennsylvania

Candidate:Zheng, Yifeng

Full Text:PDF

GTID:1448390005466210

Subject:Computer Science

Abstract/Summary:

Fueled by novel technologies capable of producing massive amounts of data, scientists have been faced with an explosion of information that must be rapidly analyzed and integrated with other data to form hypotheses and create knowledge. Success in science now hinges critically on the availability of computational and data management tools to meet these challenges.;Michael Stonebraker recently argued that the traditional database concept of "one size fits all" which provides a unique strategy to manage data in all different applications, is no longer applicable in the database market. Nowhere is this truer than with scientific data. Scientific data differs significantly from business data, for which current database technology has been developed.;My research is focused on tree-structured scientific data management, one type of scientific data that models an inherently hierarchical process or object. Due to its hierarchical structure, XML has become a common scientific data format (http://xml.gsfc.nasa.gov). However, XML's standard query languages, XPath and XQuery, are not well suited for many scientific applications, in particular, computational linguistics and phylogenetic tree applications. I have spent a significant portion of my research efforts to efficiently support these two types of scientific applications. Specifically, I have studied and summarized commonly used operations (queries) on the data, analyzed why XML techniques cannot be easily applied, and designed and implemented data management systems for these two types of applications.

Keywords/Search Tags:

Data, Applications

Related items

1	Efficient data scheduling for real-time large-scale data-intensive distributed applications
2	Simulation data learning and its applications on embedded processors
3	Research On Data Placement And Fault-Tolerant Scheduling For Applications Of Data Stream In Geo-distributed Clouds
4	Numerical modeling of GPR wavefields using ray-based, Fourier, and finite-difference algorithms with applications to field data
5	Theory and applications of data hiding in still images
6	Using XML views to improve data-independence of distributed applications that share data
7	Research On Adaptive Scheduling Method For Big Data Applications In Hybrid Clouds
8	An integration architecture for large scale Web applications involving workflow, data exchange, and knowledge bases
9	Privacy-preserving data analytics for big data applications
10	Study On The Heterogeneous Reconfigurable System On Chip Targeting At The Big Data Applications