Massive Data Aggregation And Parallel Implementation With Complex Constraints

Posted on:2015-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:W L Liu

Full Text:PDF

GTID:2308330479489711

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Massive data aggregation algorithm is the core of Online Analytical Processing(OLAP). OLAP aims at handling data related to Business Intelligence(BI), which is a very important but quite complex issue. OLAP can respond to queries in a very short period of time using N-dimensional data model(data cube) generated by aggregation algorithms; at the same time, the development of BI makes range aggregation queries with complex constrain(Multi-Dimensional Range Query) more significant. With the development of computer science, the analytical queries become more complex, dimensions of the data cube continue rising, the size of data keeps growing, these changes bring the area of OLAP system a serious challenge especially for the massive data aggregated.CUDA is an integrated technology launched by NVIDIA, introducing the graphics processor(Graphic Processing Unit, referred to as the GPU) to high-performance computing. Compared with traditional CPU, GPU has more computing power and lager inner bandwidth, so how to design new aggregation algorithms suitable for GPU becomes a new hot topic in the industry of OLAP.For massive data aggregation algorithms used in GPU architecture, the main contents are as follows:(1) Analyzed the main factors that distinguish classic CPU and new GPU algorithms, worked on their respective advantages and disadvantages; summarized what affects the performance of aggregation algorithms.(2) For parallel aggregation proposed Multi-dimensional Prefix Tree Model, which divides data space by prefix encoding, compresses and stores raw data. Proposed parallel construction algorithms, by contract experiment proved that it has a speedup of 6 times over databases.(3) Based on Multi-dimensional Prefix Tree Model and utilizing GPU, proposed a select and aggregation parallel algorithm, which is oriented to massive data and can quickly answer queries with complex constraints. By contract experiment, proved that it has a speedup of 6 times over databases and a speedup of 1.3 times over similar algorithms based on GPU.

Keywords/Search Tags:

parallel, aggregation algorithm, data cube, GPU, multi-dimensional prefix tree, multi-dimensional range query

PDF Full Text Request

Related items

1	Research And Implementation Of Multi-dimensional Association Rules Based On Prefix Tree
2	Research On Aggregation For Complex Query Based On Data Cube
3	Design and Implementation of Routing Algorithms for Supporting Multi-dimensional Range Query in HD Tree
4	Design And Implementation Of Query Analysis Client Of OLAP
5	A Two-dimensional Index Structure Based P2P Query Of Multi-dimensional Data
6	Research On Multi-Dimension Query Analysis Algorithm
7	Research On Key Issues Of Data Stream Multi-dimensional Modeling And Querying
8	Research On Parallelled Data Cube Computing Method Based On Multi-core CPU
9	Research On Multi-dimensional Association Rules Mining In Distributed Environments Based On Advanced Sql Query
10	The Study Of Multi-dimensional Index And Maintenance Method Based On HBase