Font Size: a A A

The Research On Compression And Query Processing Methods Of Scientific And Statistical Databases

Posted on:2007-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhaoFull Text:PDF
GTID:2178360185985569Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, more and more information are generated, and we need DBMS to manage these massive data. This is a great challenge to DBMS, because it has to store and manage the massive data efficiently and support SQL queries more effectively.There are a lot of data redundancy in massive scientific and statistical data (e.g. they come from earthquake monitor, weather forecast, experiments about physics and chemistry etc), which means the same data always exist in different places repeatedly. Such redundancy not only wastes storage but also degrades the performance of query. Compression technology is widely used for data storage and transfer because it can reduce storage space and I/O bandwidth. With the emergence of massive data, compressed database technology comes into being with the combination of compression technology and database technology. The research of compressed database technology includes the design of compression algorithms and compressed data query algorithms.Scientific and Statistical Databases (SSDB) have following characters, firstly, relation schema is relatively stable, and each attribute has limited candidate values with high redundancy, secondly, new arrival data only append to the end of current data area and do not change the exist data, thirdly, every relation is made up of plenty of attributes, but majority of queries only relative with less attributes and most queries are read-only queries. Above characters are in favor of integrating compressed database technology with SSDB. So this paper focuses on the researching about data compression methods and storage architectures which are suitable for SSDB and corresponding query processing technology on them, including data operations and query optimizations.The main results are as follows:(1)Two kinds of compression and storage strategies for SSDB are provided, and they are CCSS (Column-Compressed Storage System) and RCSS (Row-Compressed Storage System). CCSS is a compressed storage system by column.It adopts different storage structure and encoding algorithms to compress the attribute by column-wise with different granularity. It converts queries on the...
Keywords/Search Tags:Massive Data, Scientific and Statistic Database, Compressed Database, Column-Compressed Storage System, Row-Compressed Storage System
PDF Full Text Request
Related items