Research On Distributed Memory Column Store Engine

Posted on:2018-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:H X Zhong

Full Text:PDF

GTID:2348330512983221

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Recent years have seen a flurry of activities in the database research arena,aimed at improved processing against large datasets.Traditional disk-based relational databases can't fit the needs of storing and querying on big datasets.With memory price deflation,in-memory computing calls people's attention.Some representative in-memory databases use following data storage structure to perform data: 1.Binary association table(BAT).Because BATs don't contain index,they gain poorer performance when compared to those who have ones.2.Unified table(UT).UTs use global dictionaries to accelerate querying.The data are stored in the form of dictionaries' subscripts.However data merging of UTs is heavy-weight due to the update of the global dictionaries and the subscripts' update during merging runtime.3.Data block.Recent impletion of indexes which are contained in the data block will lose effectiveness due to data skew.As for snapshot isolation,data replication and transaction management are two typical methods.The former will bring extra memory costs and the latter needs complex mechanism to maintain information and control operation which increase the costs of development and maintenance.And when compared to the powerful parallel computing capability of GPU,CPU becomes the bottleneck in modern homogeneous system.An online analytical processing(OLAP)database storage system which has huge storage capacity and high query performance is presented in this thesis.The main works in this thesis including:1.Researching recent representative distributed in-memory database storage system,GPU-accelerated database system,non-volatile memory(NVM)storage system and proposing a Master/Slave distributed column storage system which based on memory and non-volatile memory.2.Three data storage structures are presented in this thesis.One of them is a byte-compressed,read optimized data structure with internal and external indexes.Another is an uncompressed,write optimized structure with an external indexes.Rest of them is a NVM-based data structure.As for data querying,both CPU SIMD and GPU have been used to accelerate querying performance.3.A light-weight snapshot isolation mechanism is presented in this thesis.Read and write operations will not block each other.Besides,read operations can be executed concurrently owing to this mechanism.The memory cost of the traditional databases' tree index alone is about three times the original data.While the total memory consumption,including indexes,data and other information stored in this system is about three time the original data.Besides,a kind of reverse index is used in this system,with whom,query performance gains magnitude upgrade.And from the test results,this system has achieved millisecond level snapshot in the worst cases.This system can be extended to support both OLAP and online analytical processing(OLTP)queries storage system conveniently in the future.The storage structure in this system can also support other compression and index schemes.

Keywords/Search Tags:

column store, in-memory database, single instruction multiple data, graphics processing unit, distributed system

PDF Full Text Request

Related items

1	Studies On CRS Crossbar Based Single-Instruction Multiple-Data Stream Computing Architectures
2	Codegeneration Technology In Column-Store In-Memory Database
3	Column Store Database---A New Approach to GIS Application
4	Research And Implementation Of SDTA High Performance Memory Subsystem
5	Research And Implementation Of Query Optimizing Of Column Store In Data Warehouse Management System
6	Efficient Star Join For Column-Oriented Data Store In The MAP Reduce Environment
7	The Parallel Loading Technology Of Column Data Index In Distributed In-Memory Database
8	Compression Algorithm Based On Support Columns Stored Data
9	Research And Implementation Of Parallel Query Processing In Column-store
10	Research On Database Optimization And Realization Based On Simulative Column-store