Advanced methods for managing transient and persistent data

Posted on:2006-03-27

Degree:Ph.D

Type:Dissertation

University:University of California, Santa Barbara

Candidate:Qiao, Lin

Full Text:PDF

GTID:1458390008470266

Subject:Computer Science

Abstract/Summary:

The opportunity for traditional DBMS is being chipped away from multiple directions. To remain viable, traditional DBMSs need to rapidly expand their applicability to new areas. Contrary to traditional data sets, transient data sets, such as data streams, consist of infinite volume of data and the data sets keep changing. Another type of data sets has extremely simple data types. They are persistent data stored on disks. They can be files and blocks, along with their query languages, more commonly thought of as access protocols, NFS and SCSI, etc. A big opportunity will be lost, if we continue to put little or no effort in examining what it would take to support these data types.; To provide aggregation efficiently over dynamic streaming data, we first propose a dynamic summary representation over data stream, i.e., R(elaxed) Hist(ogram), which is constructed in one pass. RHist is maintained in memory and is used to answer aggregation queries directly. We present an integrated approach to adapt the histogram according to changes in the data distribution as well as changes in the query patterns. One step forward from RHist, we propose a two-dimensional histogram, called the hybrid histogram as a summary representation in a sliding time window against multi-valued data domains. The hybrid histogram is built based on the uni-dimensional and non-overlapping exponential histograms. To adapt to the changing data distribution in the sliding time window on the fly, the hybrid histogram exploits a dynamic partitioning strategy over the data value domains.; To manage persistent data efficiently using DBMS technology, we first propose STORAGEDB, as a paradigm for leveraging a DBMS for building a storage virtualization engine. Based on this basic paradigm, we exploit to extend STORAGEDB with various advanced features, such as recovery, online resource reallocation, storage compression and encryption. All these features are of great importance in practice and actually can be cheaply and reliably implemented in our STORAGEDB. We further studied the two major issues in STORAGEDB-DBMS CPU and space overheads. To reduce these overheads and further improve performance, we implemented SVL by changing many DBMS internal methods. Techniques, including a new aggregation function, communication buffer caching, and DBMS tuning approaches are developed. System performance, represented by CPU consumption and latency, is improved by over an order of magnitude for the embedded DBMS block virtualization solution. The overall performance of our embedded DBMS solution is comparable to a commercial block virtualization engine while it delivers more functionalities required by the storage systems.

Keywords/Search Tags:

DBMS, Data, Persistent

Related items

1	Algorithm For Detecting Persistent Hosts And Their Persistent Spreads
2	Spatiotemporal Analysis Of The Data Model Of The Dbms-stadbs And Storage Mechanism
3	Study On The Solution For Data Persistent
4	A persistent index structure for XML data using relational database platforms
5	Study On Dynamic And Distributed Persistent Data In ERP System About Tobacco Industry Enterprise
6	Research On Survival Secure DBMS And Its Key Technologies
7	An evaluation of the POSTGRES DBMS
8	Designing And Implementation Of Data Persistent Layer Frame Orient Manufacturing Execution System
9	Service differentiation using p-persistent CSMA/CA
10	Detecting Advanced Persistent Threats Based On Traffic Analysis