Font Size: a A A

Research On Optimization Of Big Data Storage Structure And Query

Posted on:2015-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:K D ZhouFull Text:PDF
GTID:2298330434965773Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Big data not only need to mass storage system, but also need fast data loading,fast query processing, high utilization ratio of storage space and to adapt to the highdynamic load.Due to the traditional relational database face various difficulties andobstacles in the management of big data. Emerge as time require, a new distributedsystem appear, but in the big data storage and query have shortcomings, the page willoptimization for distributed system from two aspects of data storage structure andcorrelations of MapReduce jobs.In a distributed system, the data storage structure directly affect the storageefficiency and processing performance of big data. In the row store structure, the datais loaded locally and the speed is fast, but it also loads additional columns, and it’shard to compress. The column store structure has high compression efficiency, but ithas additional network transferring overhead. To overcome their storages and improvethe data storage structure, this paper present a new data storage structure combiningrow and column. The experiment result shows that it’s inferior a little in data loadingto the row store structure, and it has high compression efficiency comparing with therow store structure and column store structure. It not only avoids additional disk I/O,but also cuts down the unnecessary network transfer time in column store. So, therow-column store can greatly improve big data storage and processing performance indistributed system.The low performance problem of the existing way of SQL to MapReducetranslation in the presence of complex SQL query. The reason is neglecting thecorrelations of MapReduce jobs, resulting in a large number of redundant jobs, andunnecessary consumption of resources, thereby reducing query performance sharply.The paper optimizes query performance from input correlation, data convertcorrelation and job flow correlation, and gives the optimum conditions andoptimization rules. Merging redundant MapReduce jobs to reduce unnecessaryconsumption of resources, so as to improve the data query speed.
Keywords/Search Tags:big data, MapReduce, row-column store, query optimization, distributedsystem
PDF Full Text Request
Related items