Font Size: a A A

Task Scheduling Research And Application Of Big Data In Distributed Environment

Posted on:2017-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhaoFull Text:PDF
GTID:2308330482987122Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years, data volume of the Internet has grown explosively with the rapid development of Computer Technology and Information Technology. Traditional data processing techniques are incapable of providing adequate storage and computational resources to process massive data, thus techniques oriented to massive data becomes a new research hotspot. As an efficient distributed computing programming model, MapReduce is one of the current mainstream technology for large-scale data processing. Distributed scheduling algorithm as the core part of the performance of this model, has a direct impact on the performance of MapReduce.This thesis researches and analyzes the theoretical basis of MapReduce, and proposes a Dynamic Delay Scheduling Algorithm Based on Task Classification (TCDDS Algorithm). The simulation results demonstrate the effectiveness of the algorithm. On the basis of it, this thesis introduces the Deep Packet Inspection Based on TCDDS. This algorithm improves the processing rate and property of deep packet inspection by combining deep packet inspection with MapReduce parallel processing technique. The main work of this thesis includes mainly three aspects as followed:Firstly, this thesis researches the architecture and key techniques of MapReduce, especially the task scheduling procedure. Besides, this thesis analyzes several popular task scheduling algorithms:FIFO Algorithm, Capacity Scheduler, Fair Scheduler and Delay Scheduling Algorithm, etc.Secondly, through analyzing the shortages of the existing task scheduling algorithms, this thesis proposes the Dynamic Delay Scheduling Algorithm Based on Task Classification (TCDDS Algorithm). This TCDDS Algorithm adds the process of task classification by using fuzzy comprehensive evaluation method in the scheduling process on the basis of the original delay scheduling algorithm. It classifies all tasks into three categories:high level task, medium level task and low level task, and different task categories have different waiting time threshold, and high level tasks guarantee shorter completion time, while low level tasks guarantee higher data-locality. Then this thesis carries a series of simulation example to verify the effectiveness of this algorithm. Experimental results show that the TCDDS Algorithm not only improves the data-locality of MapReduce, but also reduces response time of the whole job, thus improving MapReduce performance effectively.Finally, this thesis proposes the Deep Packet Inspection Technology Based on TCDDS, this technology improves the processing rate and property of deep packet inspection by combining deep packet inspection with the MapReduce distributed computing framework that optimized by TCDDS, thus improving the processing rate and property of deep packet inspection. The simulation results demonstrate that the effectiveness of the technology is better than the effectiveness of original DPI.
Keywords/Search Tags:MapReduce, Hadoop, Delay Scheduling, Data-locality, Fuzzy Sets
PDF Full Text Request
Related items