Font Size: a A A

Improving The Efficiency Of Bitmap Indexes For Data Warehouses

Posted on:2010-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Ali HAMADOUFull Text:PDF
GTID:2178360275482485Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The progresses of the computer technology have enabled the development of new kinds of very large database applications such as data warehouses and scientific databases. Unlike the traditional databases systems, these data sets are characterized by the high dimensionality, and are mostly read-only and append-only systems.A data warehouse is large repository of information containing data collected from different data sources over a long period of time. The warehouse data are used for analytical purposes by the knowledge workers (executive, manager, analyst) of an organization to make better and faster decisions. The query performance in such an environment is a critical challenge as most of the associated queries are complex and adhoc in nature and require huge volumes of data to be processed. A promise way to speed up the query performance in data warehouses in the bitmap index. Because the bitwise logical operations are well supported by computer hardware, the bitmaps can easily and efficiently be combined to answer queries. However, as the database grow in size, the bitmap index size increases as well. It is therefore imperative to compress the index in order to improve the query performance. One of the most notable bitmap index compression schemes is the word-aligned hybrid code (WAH).This thesis aims to present an efficient bitmap indexing technique to improving the performance of the Word-Aligned Hybrid for high-cardinality attributes in data warehousing applications. WAH compression is based on the run-length encoding, and its performance depends on the presence of long runs of identical bits. For large data sets with high-cardinality attributes, in most cases, the bitmap index of the indexed attribute (column) would be sparse; hence the compression process would be ineffective. To make the bitmap compression efficient, it's therefore necessary to reorder the data warehouse tuples to achieve higher compression rates.To reorder the tuples of base data, our approach consists to simply sort the indexed attribute before generating the bitmap index. Our strategy is therefore used as a preprocessing step before compression, only to improve the performance, without affecting algorithms used for compression and query processing. Sorting of the indexed attribute ensures long runs of ones and zeros. These long runs of ones and zeros are desirable for any compression scheme that is based on run length encoding, such as WAH. To further improve the performance, we propose to bin and cluster the indexed attributed. Our experiments, conducted on five data sets with table cardinalities varying from 100,000 to 2,000,000 records, showed that our strategy significantly improves the performance of the word-aligned hybrid code. We observed that the sizes of the bitmap indices considerably decreased for various attributes cardinalities. Moreover, we also found out that the execution time measured for both equality and range queries was substantially improved.
Keywords/Search Tags:Data Warehouses, Bitmap indices, Compression, Data reorganization
PDF Full Text Request
Related items