Font Size: a A A

Research On Application Performance Optimization Methods For Big Data Processing

Posted on:2020-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X C HuaFull Text:PDF
GTID:1368330572467312Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Big data is a unique data phenomenon that occurs with the rapid development of information technology,and its impact has penetrated into every aspect of people's lives.The MapReduce applications and neural network applications are currently the representative means for value discovery based on the massive data.For MapReduce applications,Hadoop is the most mature framework.It provides a runtime environment for MapReduce applications as well as a wealth of configuration parameters to control the operation of the applications.However,fine tuning of the configuration for applications remains a major problem for users due to the lack of expertise.Meanwhile,for MapReduce applications and neural network applications the major limitations for performance and energy efficiency of traditional architecture lies in the data movements between the processing units and main memory modules.In order to improve the performance of MapReduce and neural network based big data applications,this paper focuses on the crucial techniques of performance optimization,from the perspectives of software framework,system architecture,and dedicated accelerator structure.The primary contributions of this paper are summarized as follows:1)A performance modeling-based Hadoop configuration tuning approach.The MapReduce applications often cannot achieve optimal performance when Hadoop adopts the default configuration.Brute-force search of the configuration space is also impractical due to its sheer size.To address these problems,a two-level performance model is built based on the workflow of MapReduce application to model the relationship between configuration parameters and the application performance using ensemble modeling.A metaheuristic-based optimization algorithm is then employed to explore the parameter space.Experimental results show that the average error rate of the proposed performance model is 5.7%.The proposed approach achieves 9.6x speedup over the default configuration and 1.5x speedup over the state-of-the-art approach.2)A dynamic task offloading based near-data processing(NDP)approach.To address the performance and power challenges incurred by data movements,the proposed NDP approach leverages the capability of 3D memory and the data parallelization of MapReduce model.The workflow of MapReduce workloads is decoupled to extract the key computation tasks,the task offloading mechanism is provided to migrate the computation tasks to NDP units dynamically,and atomic operations are employed to optimize the memory accesses.Experimental results show that,for MapReduce workloads the proposed near-data processing approach restricts 75%of the data movements within the memory module,indicating the data movements between the main memory and the host processors are significantly reduced.Compared with the state-of-the-art approach,the proposed approach improves system performance and energy efficiency by 70%and 44%,respectively.3)An memristor based accelerator for convolutional neural networks(CNNs).The in-memory computing accelerator is proposed based on the capability of memristor to performing computation beyond data storage.The modules are carefully designed to support full-fledged CNNs.A hybrid kernel mapping mechanism is employed to improve the resource utilization.It makes full use of the parallelism of convolution kernels,the reuse of input data,and the accumulation of channels to increase the utilization from the perspective of space.Moreover,the weights of convolution layers are re-mapped to the available computation arrays based on their amounts of computations to balance the pipeline,improving the utilization from the perspective of time.Results show that the hybrid kernel mapping mechanism achieves a performance improvement of 25.1x on VGG-16.The energy efficiency of the proposed approach is 25%higher than that of the state-of-the-art approach.This paper explores the application performance optimization methods for big data processing.The proposed design and optimization methods can provide some guidance and solutions for big data application performance optimization.
Keywords/Search Tags:big data application, performance optimization, Hadoop configuration, data movement
PDF Full Text Request
Related items