Font Size: a A A

Massive Data Aggregation And Parallel Implementation With Complex Constraints

Posted on:2015-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:W L LiuFull Text:PDF
GTID:2308330479489711Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Massive data aggregation algorithm is the core of Online Analytical Processing(OLAP). OLAP aims at handling data related to Business Intelligence(BI), which is a very important but quite complex issue. OLAP can respond to queries in a very short period of time using N-dimensional data model(data cube) generated by aggregation algorithms; at the same time, the development of BI makes range aggregation queries with complex constrain(Multi-Dimensional Range Query) more significant. With the development of computer science, the analytical queries become more complex, dimensions of the data cube continue rising, the size of data keeps growing, these changes bring the area of OLAP system a serious challenge especially for the massive data aggregated.CUDA is an integrated technology launched by NVIDIA, introducing the graphics processor(Graphic Processing Unit, referred to as the GPU) to high-performance computing. Compared with traditional CPU, GPU has more computing power and lager inner bandwidth, so how to design new aggregation algorithms suitable for GPU becomes a new hot topic in the industry of OLAP.For massive data aggregation algorithms used in GPU architecture, the main contents are as follows:(1) Analyzed the main factors that distinguish classic CPU and new GPU algorithms, worked on their respective advantages and disadvantages; summarized what affects the performance of aggregation algorithms.(2) For parallel aggregation proposed Multi-dimensional Prefix Tree Model, which divides data space by prefix encoding, compresses and stores raw data. Proposed parallel construction algorithms, by contract experiment proved that it has a speedup of 6 times over databases.(3) Based on Multi-dimensional Prefix Tree Model and utilizing GPU, proposed a select and aggregation parallel algorithm, which is oriented to massive data and can quickly answer queries with complex constraints. By contract experiment, proved that it has a speedup of 6 times over databases and a speedup of 1.3 times over similar algorithms based on GPU.
Keywords/Search Tags:parallel, aggregation algorithm, data cube, GPU, multi-dimensional prefix tree, multi-dimensional range query
PDF Full Text Request
Related items