Font Size: a A A

The Research Of Big Data Processing And Resource Scheduling For Heterogeneous Computing

Posted on:2017-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Z WangFull Text:PDF
GTID:1368330569998496Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Big Data and continuous improvement of analysis algorithm complexity,the demand for computing performance in data processing system is becoming increasingly prominent.In this context,heterogeneous computing has received more and more attention,and high performance is becoming one of the most important design requirements in data processing system.On the other hand,the MapReduce distributed system has become more and more popular in data processing and analysis and has played an important role in the practical application.As a result,researching the MapReduce technique based on the heterogeneous computing is very important for improving the system performance and efficiency of data processing.Although the MapReduce model has a strong commonality to the hardware platform,the complex heterogeneous computing environment brings new challenges for system designing,optimizing,and fair allocating of computing resources.Moreover,with the widely use of SIMD(Single Instruction Multiple Data)vector technique in the heterogeneous computing area,the high-performance task computing in the MapReduce system has encountered great difficulties.Therefore,this paper studies the design and optimizing of MapReduce system based on the heterogeneous computing and SIMD vector technique,and proposes the scheduling algorithm for fair allocating of heterogeneous resources.The main contributions in this paper are as follows:(1)We design a MapReduce data processing framework micMR based on heterogeneous computing.Firstly,we establish a model H-MUD based on the characteristics of MapReduce for describing the heterogeneous MapReduce data streams;secondly,according to the characteristics of model and heterogeneous computing,and based on the optimization goals,we design the heterogeneous MapReduce framework micMR,the I/O optimizing techniques based on the system bus,and the scalability for coprocessor;finally,we analyze the performance of system framework and the optimizing techniques in the heterogeneous computing environment.The experimental results show that micMR has a desirable performance for heterogeneous computing,and the optimizing techniques also have good optimization performance.(2)We study the optimization technology and implementation of MapReduce system based on the SIMD heterogeneous computing environment.Firstly,a quantitative evaluation and analysis on Intel MIC coprocessor is researched;then,the optimization technique for SIMD computing in the MapReduce framework is researched,including the map task computing,data storage,and Hash algorithm,which can improve the computational performance in MIC coprocessor;finally,micMR is implemented in the CPU-MIC heterogeneous computing node.The performance of the vector optimization technique is analyzed.Meanwhile,experimental results show that the performance of micMR is 8.4to 45.8 times higher than Phoenix++.(3)We study the fair scheduling algorithm for heterogeneous computing resources.We firstly study the weighted DRF algorithm,and propose a weight calculation method based on the system real time state statistics,which can improve the performance of task computing;then,we propose a fair scheduling algorithm based on the dominant fairness for multi-user hierarchical scheduling,which can ensure the fair distribution of resources,improve the resource utilization,and shorten the time of task completion.These two scheduling algorithms are implemented in the CPU-MIC heterogeneous cluster,and the experimental benchmarks are the same as the test feature in the Facebook trace.The experimental results show that compared with DRF algorithm,weighted DRF algorithm can improve the computational performance by 19.2%;dominant fairness fairness scheduling algorithm can guarantee the resource allocation fairness,avoid the hunger of task and computing resources idle effectively,and improve the computing performance by 18.5%compared with the H-DRF algorithm.(4)We implement the micMR system based on the large-scale heterogeneous computing cluster.By utilizing the Hadoop system,micMR can manage large scale cluster.In addition,the coprocessor fault tolerance mechanism can solve the coprocessor failure.Finally,the cloud computing platform based on the container technique can manage and deploy micMR system easily.Through the system implementation and experimental analysis on the Tianhe-II platform,micMR is 2.0 to 5.1 times higher than Hadoop in the cluster environment.
Keywords/Search Tags:Heterogeneous Computing, Many Integrated Core, Big Data, MapReduce, SIMD Computing, Heterogeneous Resource Scheduling
PDF Full Text Request
Related items