Font Size: a A A

Advanced methods for managing transient and persistent data

Posted on:2006-03-27Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Qiao, LinFull Text:PDF
GTID:1458390008470266Subject:Computer Science
Abstract/Summary:
The opportunity for traditional DBMS is being chipped away from multiple directions. To remain viable, traditional DBMSs need to rapidly expand their applicability to new areas. Contrary to traditional data sets, transient data sets, such as data streams, consist of infinite volume of data and the data sets keep changing. Another type of data sets has extremely simple data types. They are persistent data stored on disks. They can be files and blocks, along with their query languages, more commonly thought of as access protocols, NFS and SCSI, etc. A big opportunity will be lost, if we continue to put little or no effort in examining what it would take to support these data types.; To provide aggregation efficiently over dynamic streaming data, we first propose a dynamic summary representation over data stream, i.e., R(elaxed) Hist(ogram), which is constructed in one pass. RHist is maintained in memory and is used to answer aggregation queries directly. We present an integrated approach to adapt the histogram according to changes in the data distribution as well as changes in the query patterns. One step forward from RHist, we propose a two-dimensional histogram, called the hybrid histogram as a summary representation in a sliding time window against multi-valued data domains. The hybrid histogram is built based on the uni-dimensional and non-overlapping exponential histograms. To adapt to the changing data distribution in the sliding time window on the fly, the hybrid histogram exploits a dynamic partitioning strategy over the data value domains.; To manage persistent data efficiently using DBMS technology, we first propose STORAGEDB, as a paradigm for leveraging a DBMS for building a storage virtualization engine. Based on this basic paradigm, we exploit to extend STORAGEDB with various advanced features, such as recovery, online resource reallocation, storage compression and encryption. All these features are of great importance in practice and actually can be cheaply and reliably implemented in our STORAGEDB. We further studied the two major issues in STORAGEDB-DBMS CPU and space overheads. To reduce these overheads and further improve performance, we implemented SVL by changing many DBMS internal methods. Techniques, including a new aggregation function, communication buffer caching, and DBMS tuning approaches are developed. System performance, represented by CPU consumption and latency, is improved by over an order of magnitude for the embedded DBMS block virtualization solution. The overall performance of our embedded DBMS solution is comparable to a commercial block virtualization engine while it delivers more functionalities required by the storage systems.
Keywords/Search Tags:DBMS, Data, Persistent
Related items