Font Size: a A A

Research And Implementation On Column Storage Optimization Technology For Large Scale Relational Data

Posted on:2018-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:H F DongFull Text:PDF
GTID:2348330518993313Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, structured relational data has reached PB or even higher. However, in the face of the growing data size and application requirements, the current mainstream column storage scheme can not meet the urgent demand for efficient query of relational data. Aiming at the inefficiency of data index in mainstream column storage structure and the complexity of algorithm selection and computation in data compression, the design and implementation of related optimization schemes are carried out.In this paper, based on the mainstream column storage structure to improve, we put forward a page-based multi-level metadata index hybrid storage structure SP-RCFile, and through the extended interface of StorageHanlder by Hive to implement it. SP-RCFile adds the index page scheme, which stores all metadata index information of current data segment in index page, including segment index information, extent index information, data page index information and so on. Queries can read the index page directly, thus speeding up data location and data filtering. Secondly, we propose an extent-level data compression selection strategy based on similarity computation(ECSC). ECSC takes into account the data type of each column and the local data distribution characteristics, calculating the similarity of the data distribution characteristics between the two adjacent data extents, and then recommending a compression algorithm. Experiments show that,compared with the current column storage structure, the proposed scheme achieves higher data compression rate and reduces the time overhead in the compression process. In addition, based on SP-RCFile, in order to directly query the compressed data, some relevant algorithms are designed and implemented, making SQL query can be based directly on compressed data. In the complex query scene, it improves the query efficiency.
Keywords/Search Tags:relational data, column-storage, page, multi-level index, similarity calculation
PDF Full Text Request
Related items