Research On Optimization Of Big Data Storage Structure And Query

Posted on:2015-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:K D Zhou

Full Text:PDF

GTID:2298330434965773

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Big data not only need to mass storage system, but also need fast data loading,fast query processing, high utilization ratio of storage space and to adapt to the highdynamic load.Due to the traditional relational database face various difficulties andobstacles in the management of big data. Emerge as time require, a new distributedsystem appear, but in the big data storage and query have shortcomings, the page willoptimization for distributed system from two aspects of data storage structure andcorrelations of MapReduce jobs.In a distributed system, the data storage structure directly affect the storageefficiency and processing performance of big data. In the row store structure, the datais loaded locally and the speed is fast, but it also loads additional columns, and it’shard to compress. The column store structure has high compression efficiency, but ithas additional network transferring overhead. To overcome their storages and improvethe data storage structure, this paper present a new data storage structure combiningrow and column. The experiment result shows that it’s inferior a little in data loadingto the row store structure, and it has high compression efficiency comparing with therow store structure and column store structure. It not only avoids additional disk I/O,but also cuts down the unnecessary network transfer time in column store. So, therow-column store can greatly improve big data storage and processing performance indistributed system.The low performance problem of the existing way of SQL to MapReducetranslation in the presence of complex SQL query. The reason is neglecting thecorrelations of MapReduce jobs, resulting in a large number of redundant jobs, andunnecessary consumption of resources, thereby reducing query performance sharply.The paper optimizes query performance from input correlation, data convertcorrelation and job flow correlation, and gives the optimum conditions andoptimization rules. Merging redundant MapReduce jobs to reduce unnecessaryconsumption of resources, so as to improve the data query speed.

Keywords/Search Tags:

big data, MapReduce, row-column store, query optimization, distributedsystem

PDF Full Text Request

Related items

1	Research And Implementation Of Key Techniques For Query Rewriting In Column-Store Data Warehouse
2	Optimization And Implementation For DWMS Column-Store Query Execution Engine
3	Research And Implementation Of Query Optimizing Of Column Store In Data Warehouse Management System
4	Research On Query Optimization In Column-Oriented Data Warehouse
5	Research On Database Optimization And Realization Based On Simulative Column-store
6	Multi-Query Optimization Strategy Design And Implementation In Column-based OLAP System
7	The Optimization Of The Query Execution Engine In Column Oriented DWMS
8	Research And Implementation Of Parallel Query Processing In Column-store
9	Compression Algorithm Based On Support Columns Stored Data
10	Based On The Research And Application Of Column Stores Rfid Data Management Technology