Implementation Of Data Compression, Operation And Query Processing System Based On BAP

Posted on:2009-05-26

Degree:Master

Type:Thesis

Country:China

Candidate:J G Jia

Full Text:PDF

GTID:2178360278964773

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Accompanying with the development of information techniques and its wide application in finance, traffic, national defense, environment and ecosystem monitoring, massive data is deluging the whole world. This is a gread challenge to DBMS. As the ratio between the capability and price of disk becomes higher and higher, the really problem is how to store and execute queries on massive data efficiently, instead of the storage of massive data itself.There are a lot of data redundancy in massive high frequency data, which means the same data always exist in different places repeatedly. Such redundancy not only wastes storage but also degrades the performance of query. And if we make full use of the compressed database technology, we can reduce the storage amd I/O bandwidth. The research of compressed database technology includes the design of compression algorithms and compressed data query algorithms.There has been renewed interest in column-oriented database architectures in recent years. For read-mostly query workloads such as those found in data warehouse and decision support applications,"column-stores"have been show to perform particularly well relative to"row stores". Storing data in columns presents a number of opportuneities for improved performance from compression algorithms when compared to row-oriented architectures.Based on the existing relational database techniques,this paper focuses on the researching about data compression methods and storage architectures which are suitable for high frequency data and corresponding query processing technology on them, including data operations and some query optimizations. The main results are as follows:It proposes one kind of compression and storage strategy called TIDC. TIDC is a column oriented compression method based on attribute partition. It uses the information of position (called TupleID in the paper) to connect all the attributes in the database. By only storing the position and its value of the non-constant data from the same attributee, TIDC reduces the storage of the data and makes complete mapping from the original data to the compressed data. To operate on the compressed data, we can get the result of a query without decompressing the compressed data. It presents data operation algorithms including selection, projection and join, and some optimization strategies based on compressed data corresponding to TIDC method.It proposes compression algorithm and data operation algorithms including selection, projection and join, and also give some optimization strategies for the query processiong corresponding to BAP method.A prototype of compressed DBMS using above technology is implemented. Theoretical analysis and preliminary experiments results show that by compressing and storing by column-oriented strategy based on attribute partion, it can greatly reduce storage space, lower query I/O cost and improve query efficiency. What's more, the amount of massive data has less effect on query efficiency using TIDC than that of BAP.

Keywords/Search Tags:

Massive Data, High Frequency Data, Compressed Database, Data Operation, Column-Compressed Storage System

PDF Full Text Request

Related items

1	The Research On Compression And Query Processing Methods Of Scientific And Statistical Databases
2	Research On Compression, Operation And Query Processing Methods Of Massive Datasets
3	Research On Non - Decompression Algebra Operation Algorithm On Compressed Data
4	Research And Implementation Of Data Compression Based On Column-Oriented Database System
5	Research On Several Data Mining Algorithms For Massive RFID Data
6	Compressed Storage Processing Of The Sensor Data. Cloud Storage Environment
7	Design And Implementation Of Data Dictionaries In Column Storage DWMS
8	Research And Implementation Of Compression For Structured Data On Hadoop Platform
9	Research On Data Mining Algorithm Based On Compressed Database
10	Methods for querying compressed wavefields