Font Size: a A A

Study On The Analysis And Optimization Of Column Storage Performance Based On Hive On Spark

Posted on:2018-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:W NieFull Text:PDF
GTID:2428330566995786Subject:Software engineering
Abstract/Summary:PDF Full Text Request
For the traditional data storage solutions,generally use the line storage engine for storage.By comparing the shortcomings of the line storage engine,the corresponding column storage engine has better advantages for storage.Columnar compression has a higher compression ratio than row-based storage,and more I / O operations can be reduced by fetching the specified column data,all of which are more attractive for storing huge amounts of data.Using columnar storage can better reflect its performance Advantage,cost-effective better.Based on this,more popular use of Orc and Parquet column storage file came into being.In order to better optimize this columnar storage,in the Hive on Spark environment,the relevant data is generated by the TPC-DS benchmarking framework,compared with Orc and Parquet columns that generate the same amount of data.In the process of generating data,Orc uses the data without any compression method for data acquisition.For Parquet storage,it uses the run-length code and the dictionary code respectively for data compression,and finally obtains the size of the generated data.Comparing the data volume of the two before and after,through the compressed data volume to see whether it can improve the data storage efficiency,to determine Orc's optimization program,Orc column-based storage re-read and write design and add the appropriate encoding for design,Finally,the difference between before and after optimization is determined through experiments.Through the optimized design of Orc column storage,the data storage formats of Orc and Parquet are also respectively specified in the process of generating the data volume.The difference lies in that the method of generating Orc uses run length coding and dictionary coding.Compared with the previous data storage effect,Orc optimized data storage can be greatly improved,so for certain types of column data using a certain encoding can really improve disk storage performance.
Keywords/Search Tags:Big data, Column storage, Database storage, Compression ratio
PDF Full Text Request
Related items