Font Size: a A A

Research On Acceleration Of Similarity Metric For Big Data

Posted on:2017-03-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W XuFull Text:PDF
GTID:1318330482494203Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Since the 21st century, the amount of data is growing exponentially with the rapid development of information technology. These vast amounts of data bring new challenges: high throughput, high energy efficiency and variety to data mining and especially the bottleneck of data mining-similarity measure.However, as Complementary Metal-Oxide-Semiconductor (CMOS) scaling no longer provides gains in energy efficiency commensurate with transistor density increases, performance and energy efficiency of Central Processing Units (CPUs) become far from satisfaction for processing these huge number of data. Therefore, we select two typical similarity measures:Dynamic Time Warping (DTW) for stream data processing and Earth Mover's Distance (EMD) for image processing for performance optimization. Specially, we achieve performance optimization from the perspective of architecture (digital circuit), device (analog circuit) and application (application domain).First, in order to solve the high-performance and variety requirements in stream data mining, we introduce an FPGA-based scalable and parameterized DTW architecture for stream mining. Particularly, k-Nearest Neighbor (kNN) is adopted in the architecture. Two processing element (PE) rings for DTW and kNN are designed to achieve parameterization and scalability with high performance.Second, to achieve task scheduling and data management on System on Chip (SoC) devices in stream data mining, we propose a general and pipelined DTW acceleration architecture based on CPU+FPGA heterogeneous platform, which serves as a sub-function for all higher tasks. We utilize the potential parallelism of DTW, and implement a parameterized and pipelined DTW accelerator on the reconfigurable hardware. Software optimizations of DTW for different tasks are implemented on hard-core ARM, and the DTW accelerator serves as a sub-function. By collaboration of software optimizations and hardware accelerations, the throughtput and energy efficiency of DTW acceleration architecture can be significantly improved.Third, DTW acceleration with high throughtput and energy efficiency is realized with a new device, memristor. We present a novel analog memristor-based DTW architecture. In this new architecture, memristor is adopted for both computation and configuration. As the computation is in a continuous and asynchronous manner, we exploit the predictability in the DTW computing process to further improve the throughput. Specifically, we develop an early lower bound algorithm for lower bound methods and an effective early termination algorithm for DTW calculation.Forth, EMD acceleration is achieved with instruction set extension on CPU+FPGA heterogeneous platform. We first analyzed EMD algorithm. After identifying its bottleneck, extended instruction are designed. Compared to previous hardware accelerations, the EMD acceleration architecture can support calculation of histograms with hundreds to thousands of bins.Fifth, EMD is optimized dedicatedly for sleep posture recognition application. A matching-based approach, Body-Earth Mover's Distance (BEMD), is presented for sleep posture recognition with a pressure sensitive bedsheet. First, BEMD converts pressure pictures into histograms with three descriptors. Then, histogram normalization is applied with body mass index (BMI). At last, BEMD combines EMD and Euclidean distance to evaluate the similarity of sleep postures. Specific methods in pre-processing and classification are also proposed. Long tail removing is presented to remove information redundancy, and a skew-based sleep posture classifier is proposed for classification. Compared with existing work, no local feature extraction is involved.In this paper, we study the performance optimization for two widely-used similarity measures, DTW for stream data processing and EMD for image processing. The research gives more in-depth understanding from the perspectives of digital circuit, analog circuit and application. The achieved high performance improvement will make data mining more efficient to meet the challenges of data explosion in big data.
Keywords/Search Tags:Big Data, Data Mining, Similarity Measure, Dynamic Time Warping, Earth Mover's Distance
PDF Full Text Request
Related items