Scheduling Big Data Tasks With Data Security And Privacy Constraints

Posted on:2022-03-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W Hu

Full Text:PDF

GTID:1488306740463084

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data security,privacy and trust exist widely in many fields,such as bioengineering,intelligent manufacturing,modern agriculture,intelligent medicine and public security.Facing the rapid growth of Big Data,how to store and process it quickly and effectively is the focusing problem,which depends on the data analysis framework and the corresponding computing environment,Map Reduce and Spark running in cloud computing environment are typical representatives.In the Big Data environment,data blocks are used as units for storage and processing.In different computing frameworks,the constraints between tasks are not always independent.Tasks in the Map Reduce framework are under the linear constraint.Workflow tasks in the Spark framework are non-linear constraint.Applications may also have deadline requirements.How to schedule these tasks to geographically distributed and heterogeneous cloud computing resources quickly and stably is the key problem to practical applications.Different requirements such as security,privacy and trust greatly increase the scheduling difficulty.This dissertation studies Big Data task scheduling problem with security,privacy,trust,outage constraints and other practical application requirements,which are theoretically significant and wide spread in practice.The main contributions of this dissertation are as follows:(1)A distributed Map Reduce task scheduling problem with security and outage constraints is considered.An algorithm framework with security and outage constraints is proposed.This framework includes three algorithm components: task matching,queue sorting and outage checking.Firstly,a task and resource matching strategy considering task priority and data security is invested in the map phase.Secondly,a mechanism is proposed to sort the assigned map tasks before the start of the reduce phase,and the earliest available time priority rule is used to sort the reduce tasks.Thirdly,a method is established to reschedule the tasks on the outage nodes to adjust the mapping process,according to the outage probability.Based on the random instances,parameters in the algorithm are calibrated by the analysis of variance technique.The algorithm is compared based on the benchmark test sets,and the comparison results show that the performance of the proposed algorithm is greatly affected by the outage probability and the number of nodes.(2)Considering the cloud workflow scheduling problem with trust constraint,a general trust model based on direct trust and recommend trust is established.An iterative adjustment heuristic algorithm framework is proposed.There are three algorithm components in the framework,which are initial solution generation,candidate solution construction and results adjustment.In the first component,three heuristic candidate rules are established.Local search with two candidate solution strategies are constructed for the second component.And at last,the max-min solution adjustment strategy,which is based on minimizing the total cost trust ration,is investigated.Based on the random instances,parameters of the algorithm are calibrated by the analysis of variance technique.Based on the benchmark test sets,the calibrated optimal algorithm is compared with the modified existing algorithms of similar scheduling problem,and the experimental results verify the superiority of the proposed algorithm.(3)A hybrid clouds workflow scheduling problem with privacy data is considered.A hybrid clouds workflow scheduling algorithm framework with privacy data is proposed.The framework includes four algorithm components: deadline dividing,stage sorting,task scheduling and results adjustment.Three different candidate rules are used in deadline dividing.Three simple candidate strategies are established for stage sorting.Scheduling strategies are proposed for privacy and non-privacy tasks,respectively.To satisfy the objective of minimize the rental cost,results adjustment method which searching the idle slot to improve the utilization of virtual machines is constructed.Based on the random instances,the variance analysis technique is used to calibrate the parameters of the algorithm.Five different structures of scientific workflow benchmark test sets are used to compare the algorithms,and the experimental results show that the proposed algorithm performs better than the comparison algorithms in most cases.(4)A Big Data tasks scheduling problem with data affinity is considered.A Big Data tasks scheduling algorithm framework with data affinity is proposed.The framework contains four algorithm components.Firstly,Stages are sorted according to the priority of their time parameters.Secondly,tasks performed in parallel within Stages are sorted.Thirdly,four virtual machine search strategies are designed to allocate resources to tasks.Finally,the solution is optimized by adjusting the Stage sequence and using idle time slots.The optimal selection of each calculator in the algorithm framework is determined by the simulation experiment.The efficiency of the proposed algorithm is proved by comparing with two modified similar algorithms on two kinds of scientific workflow instances.

Keywords/Search Tags:

Big Data, MapReduce, Spark, Task scheduling, Scheduling Optimization Algorithm

PDF Full Text Request

Related items

1	Scheduling Optimization Research For MapReduce
2	Research On Spark Task Scheduling Technology Based On Execution Time Prediction
3	Research And Optimization Of Adaptive Techniques For Mitigating Skew In Spark
4	The Research On High Performance Task Scheduling Technology Based On Mapreduce In Cloud Computing
5	Research On Key Issues Of Task And Job Scheduling For MapReduce Clusters
6	Research On Task Scheduling Algorithm Under MapReduce Framework
7	MapReduce-based Resource Scheduling Model And Algorithm Research In Cloud Environment
8	The Elastic Resource Allocation And Task Scheduling Of Spark
9	Spark Task Scheduling With Data Skew And Deadline Constraints
10	The Research On Spark Task Scheduling Strategy Based On Dynamic Memory Awareness