Font Size: a A A

GPU-based Data Management

Posted on:2009-12-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:K YangFull Text:PDF
GTID:1118360248954262Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The database technology is the fundamental technology for processing data and managing information. There are three important aspects of data management, namely data access, operation and visualization. Currently, these three aspects are challenged on the performance and efficiency. The graphics processing unit (GPU) is a unique class of new hardware with strong abilities in memory access, computation and visualization. It brings opportunities to answer the challenges in data management. In this thesis, we study GPU-based methods of data access, operation and visualization, in order to improve the performance and efficiency of data management.Surveying the general computations on GPUs (GPGPU), particularly the GPU's applications in database technologies, we find several profitable function layers of the GPU, such as the general-purposed parallel computing function, the graphics pipeline computing function, and interactive visualization function, etc. We utilize the general-purposed parallel computing function to accelerate data access, utilize the graphics pipeline computing function to accelerate data operations, and utilize the unique double characters of "parallel computing" and "graphics processing" of the GPU to accelerate and improve the visual data analysis. These three parts of work together form a system of Data management On Gpus (DOG).Specifically, for data access, using general-purpose computing methods, we develop a GPU-based access framework consisting of guidelines and primitives. The guidelines fully match the GPU's parallel hardware characteristics, and the primitives are building blocks for general index bulk loading and query methods. Based on this framework, we carry out the access methods for three indexes, namely the grid file, the quadtree and the R-tree. By experiments, the GPU algorithms are generally several (up to ten) times faster than the multi-core CPU counterparts with existing methods.For data operation, using the graphics pipeline, we develop a set of data operation primitives that are building blocks for general operators. We use these primitives to implement four typical join methods, namely nested loop, indexed nested loop, sort-merge and hash. By experiments, the GPU algorithms are up to seven times faster than the multi-core CPU counterparts with existing methods.We study two visual analysis problems. For the relationship between multidimensional datasets, we propose an information visualization method, the Parallel Scatterplots. It combines the methods of parallel coordinates and scatterplots, integrates multiple techniques, and facilitate effective examining and analyzing the join relationships between multidimensional datasets. To reduce the visual cluttering on large data, we propose a highly efficient clustering algorithm based on space-filling curves on the GPU, this algorithm brings twenty times speedup to the clustering performance. We integrate the join, clustering and visualization onto a GPU-based system which performs interactive join and clustering and high-quality interactive visualization on tens of millions of data.Another problem on visual analysis is data cubes in On-Line Analysis Processing (OLAP). For this, we propose GPU-based Interactive 3D Cubes. We propose a Rendering- As- Aggregation (RAA) algorithm that maps distributive OLAP aggregations to the intrinsic rendering mechanisms of the GPU. The computation of the cube is also the process of visualization, and the double characters of "parallel computing" and "graphics processing" of the GPU are fully combined. Therefore, the system performance is greatly enhanced. Our method requires no precomputation time or extra storage, provides interactive cube computation, three-dimensional OLAP operations and high-quality interactive visualization for tens of millions of data items.In general, in comparison with existing methods, our DOG system achieves high speedups in performance and evident improvements in visualization effects, proposes novel methods and approaches, and is practical in real applications.
Keywords/Search Tags:Database, graphics processor (GPU), parallel computing, multidimensional access, relational join, information visualization, visual analysis, data cubes
PDF Full Text Request
Related items