Font Size: a A A

Research And Implementation Of Task Reliability In Dispersed Environment

Posted on:2023-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:J B ZhangFull Text:PDF
GTID:2558306908965089Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the context of the continuous development of Internet technology,the lack of unified management of these resources and their ineffective utilization are caused by the fact that various types of Io T devices are highly dispersed geographically and run in their own independent systems.In this context,dispersed computing DCOMP comes into being,which effectively and dynamically groups terminal devices with different functions based on network and geographic location information through DCOMP middleware technology to execute computing tasks in a local scope,effectively avoiding the communication delay in traditional cloud computing and more suitable for executing real-time computing tasks with strong environmental interaction.Due to the threats such as dynamic joining and exiting of dispersed resources and task uncertainty caused by the vagaries of their environment,ensuring highly reliable execution of tasks in a dispersed network has become an urgent problem.In this paper,we aim at the reliable execution of tasks in a dispersed environment and start research from two directions: reliable scheduling of tasks and fault-tolerant scheduling.On the one hand,a dispersed prototype system is designed and implemented,which provides a basic task scheduling and execution mechanism,and a certain guarantee mechanism for reliable task scheduling.On the other hand,the dispersed network environment model and task scheduling model are constructed according to the characteristics of the dispersed network environment,and a task execution plan table is built to analyze the execution queues of tasks on the nodes in the form of a schedule,so as to rationalize the use of resources by each task.For reliable scheduling of tasks,the failure rate of computing nodes and corresponding communication links is modeled first,so as to quantitatively analyze and evaluate the reliability of the scheduling plan.Then a priority scheduling algorithm R-PSA with reliability as the goal is proposed,which first ranks the subtasks in the DAG by reliability rank,and then finds the node that makes the current subtask execute with maximum reliability considering reliable data transmission;to avoid the local optimum problem,this paper proposes an algorithm R-IFSA that improves the search-based fireworks algorithm,which searches for the most "optimal" solution in terms of reliability through multiple iterations by means of operations such as explosion and Gaussian variation of the scheduling strategy.Finally,through simulation comparison experiments with classical scheduling algorithms such as random scheduling and HEFT to verify its advantages in terms of reliability.In order to solve the problem of "single point of failure" in the case of single replica of tasks,this paper first analyzes the impact of fault tolerance strategies of passive replication and initiative replication on task reliability,and constructs a node correlation failure model by analyzing the common cause failure and dependency failure relationship among nodes to avoid the high failure rate of multiple versions due to the correlation of node failures in a dispersed environment.Secondly,a fault-tolerant algorithm FRSA-Max R-CF for nondelay-constrained DAG tasks is researched,and a group fault-tolerance mechanism is proposed,which schedules the same group of subtasks in parallel in relaxation time,and a combination of initiative and passive replication methods is adopted in the scheduling to ensure the overall reliability of the DAG tasks.For the time-delay-constrained DAG task,an algorithm FTSA-Deadline-CF is proposed to allocate the relaxation time proportionally for critical and non-critical subtasks respectively,and perform task replication for each subtask within its deadline,so as to obtain a fault-tolerant scheduling strategy with high reliability within the deadline of the DAG task.Finally,through simulation experiments,it is verified that the fault-tolerant scheduling algorithm proposed in this paper can significantly improve the task reliability.
Keywords/Search Tags:Dispersed Cmputing, Reliability, Fault Tolerance, DAG, Correlated Failure
PDF Full Text Request
Related items