Font Size: a A A

Research On Job Runtime Characteristics Based Performance Optimization In Big Data Processing System

Posted on:2019-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:L LuFull Text:PDF
GTID:2428330563990961Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
To meet the growing demand for big data analysis and processing,communities from academia and industry have designed and developed a variety of big data processing systems.These systems decouple the logic of data analysis applications and the underlying job execution environment so that users can efficiently develop,deploy and execute their jobs in data centers and clouds.These systems provide data parallel programming models(e.g.Map Reduce)that provide users with intuitive,application-oriented programming primitives and simplify application development.Moreover,the underling data processing system implementations shield users from the details of massively parallel distributed execution,including task partitioning,resource scheduling,node failure,and hardware heterogeneity.The application scope of big data analysis shifts from the off-line batch processing to latency-sensitive interactive computing,streaming computing,and complex iterative computing.It is important to optimization performance according to all aspects of job execution characteristic.First,the resources of many distributed computing platforms are geographically dispersed,heterogeneous,and highly dynamic.Existing data placement strategies and task scheduling algorithms are not efficient on these wide-area platforms.Second,the current functional data parallel frameworks avoid the recomputaion of intermediate results by caching data in memory,resulting in the creation of a large number of long-lived data objects in managed heap.Frequently triggerred garbage collection operations brought long-lantency pauses and significant additional overheads.Third,the current distributed graph computing systems have poor scalability on large-scale heterogeneous platforms,which affects the efficiency of the graph processing.For the problems of wide-area distributed computing platforms,we propose a novel resource-aware task scheduling mechanism based on data-affinity scheduling algorithm.In detail,our method tolerates short-term data transform failures,avoids unnecessary re-execution of tasks,and takes aggressive speculative scheduling of straggler tasks to accelerate job execution.In order to validate the proposed method,we design and implement a wide-area distributed computing emulator and an experiment framework based on the Grid5000 platform.We compare our method with Hadoop for scalability,load balance and fault tolerance.Moreover,at the framework runtime system level,we proposed the lifetime-based memory management mechanism that uses just-in-time program transformation techniques to reduce the memory footprint and garbage collection overheads during job execution.For the transformation safety,the proposed method uses static program analysis to extract the lifetime and memory occupancy characteristics of the data objects,ensuring that the original program semantics are not changed.Moreover,by adjusting the object memory layout of the same data set to continuous regions by class fields,this mechanism can also increase the compression ratio of memory data and further reduce the memory consumption of tasks.At the application-specific layer,we proposed the graph-aware dynamic load balancing mechanism with a new two-level graph partitioning algorithm.In detail,the input graph dataset is first divided at the computing node granularity.All node-level partitions are then further divided into subgraph partitions,which are the basic migration units.Each computing node is guaranteed to have a "transition affinity" with another node.Based on this partitioning mechanism,our method also uses a dynamic load balancing method to avoid the use of a constly global re-partitioning algorithm.In addition,by using the remote vertice replicas,the mechanism also implements a checkpoint mechanism for differential backup and shortens the recovery time of job progress.
Keywords/Search Tags:Distributed data processing, Data parallel model, Task scheduling, Fault tolerance, Memory management, Garbage collection, Graph partitioning, Dynamic load balancing
PDF Full Text Request
Related items