Font Size: a A A

The Clustering And Correlation Analysis Of Data Stream Based On GPU

Posted on:2016-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:X B QiuFull Text:PDF
GTID:2308330461976528Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In practice, data is often divided into two forms, static data and flowing data. The flowing data, also known as data stream, due to its mass amount, rapidly flowing, real-time changing and single scanning, the mining of which is very difficult. With great parallel computing capability and high memory bandwidth, GPU as a new kind of stream processor attracts a lot of focus.This paper studies how to use the GPU for data stream mining. In this paper, we put forward a general framework for data stream mining based on eight of CPU-GPU heterogeneous model in theory. In this model, CPU works as the main processor and be responsible for the scheduling and logic control, while GPU works as a coprocessor which is responsible for data-intensive and computing-intensive operations. This paper describes the general eight steps of this model and analyzes characteristic and the advantages of this model. With this generic model, this paper focuses on parallel algorithms of data stream clustering and data stream correlation analysis based on GPU.About the algorithm of data stream clustering, the paper describes basic concepts of data stream clustering and current research on parallel acceleration of this. In this paper, we focuses on acceleration research for the K-means algorithm, and put forward an improvement scheme based on traditional parallel acceleration. In the improved scheme, we finish the parallel acceleration of the re-computing center step and improve the parallelism of K-means. Then we achieve these two parallel schemes with GPU CUD A, and achieved nearly 8 times speedup contrast with the traditional scheme.On the data stream correlation analysis, the paper first introduces the basic concepts of data steam correlation analysis, its parallel speedup and current research status. Then paper focuses on research of CCA algorithm, and the Kernel CCA algorithm was first introduced into the data stream processing area based on non-stationary nature of data stream at the same time. Then the parallel speedup method for Kernel CCA algorithm applied into data steam correlation analysis based on GPU is given. And then C++ and CUDA C are used to achieve a CPU serial version and GPU parallel version of this algorithm, and GPU version gets nearly 16 times speedup of the CPU version.
Keywords/Search Tags:Data Stream, GPU, Clustering, Correlation Analysis
PDF Full Text Request
Related items