Research On Compression, Operation And Query Processing Methods Of Massive Datasets

Posted on:2009-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:C H Zhang

Full Text:PDF

GTID:2178360278464392

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays, the information technology developed rapidly and we have entered a new stage with massive data. It is an urgent mission to study the management on massive data for the social informationization. This is a great challenge to DBMS on how to store and manage the massive data efficiently and support SQL queries effectively.The massive database, such as the scientific and statistical database, is widely used in earthquake monitor, weather forecast, experiments about physics and chemistry, and so on. There are lots of data redundancy in such database which means the same data exist in different places repeatedly. If we store the data directly, not only the storage is wasted but also the performance of query is degreed. In addition, the relation schema is relatively stable and the candidate values for each attribute are limited. The new arrival data are only appended to the end of the current data area without updating exited data. Queries on data are only relative with minority among the plenty of attributes.The compressed database technology is the combination of data compress technology and database technology to process the storage and query on massive database. The compressed database technology includes data compression methods, data operation algorithms and query processing techniques.In this paper, we propose a new compression method and storage architecture which are suitable for massive database and supporting data operation and query processing efficiently.The compression method proposed in this paper adopts the idea of Column-Compressed Storage and uses the Binary Encoding, Unary Encoding, K-of-N Encoding and Superimposed Encoding to compress the massive data. The encoded data are then stored according to the encoding bit with an extended run length encoding.We also propose data operation algorithms on compressed data without decompressing, including the selection and projection. The operations on original data are converted into operations on the compressed bit files which are simple to realize. A prototype of compression and query on data in massive database is designed and implemented with the above technology. Theoretical analysis and preliminary experiments results show that compression using column-oriented storage can reduce the storage space, lower the query cost and improve the query efficiency.

Keywords/Search Tags:

Massive Data, Scientific and Statistic Database, Compression Database, Column-Compressed Storage

PDF Full Text Request

Related items

1	The Research On Compression And Query Processing Methods Of Scientific And Statistical Databases
2	Implementation Of Data Compression, Operation And Query Processing System Based On BAP
3	Research And Implementation Of Data Compression Based On Column-Oriented Database System
4	Study On The Analysis And Optimization Of Column Storage Performance Based On Hive On Spark
5	Research On Non - Decompression Algebra Operation Algorithm On Compressed Data
6	Research Of Compression Algorithm For Sparse Data In Column-oriented Database
7	Research On Key Technologies Of Column-Oriented Database For Big Data
8	Compression Algorithm Based On Support Columns Stored Data
9	Massive Data Following Database Research
10	Research And Implementation Of Statistic Over Data Streams For Massive Database System