Font Size: a A A

Design Of Distributed Computing Task Clustering Scheduling Algorithm In Heterogeneous Cluster

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:M R XuFull Text:PDF
GTID:2428330611481897Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,both record files in e-commerce and application programs for high-performance scientific computing have generated a colossal amount of data,which is commonly termed as “big data”.The widespread use of Internet of Things(Io T)devices has also led to the collection of data at similar scales.Such datasets must be processed and analyzed for business improvement,knowledge discovery,and scientific innovation.Parallel and distributed computing in cloud environments have proved to be an effective solution to handling big data.However,due to the rapidly growing data volume and the ever-increasing complexity and scale of computing tasks,one may have to add more physical nodes to expand the computing power of PC clusters based on virtual resources.Traditional methods for task scheduling often suffer from performance degradation when deployed in such complex environments.It is of imperative importance to effectively schedule computing tasks to appropriate nodes,reduce the completion time of jobs,and improve the overall performance of the computing system.This thesis considers a framework for distributed computing and constructs mathematical models for the formulation of a generic task scheduling problem to minimize the total running time of concurrent jobs submitted by different users.A heuristic algorithm is proposed to solve the formulated task scheduling problem.This thesis makes the following technical contributions:(1)Formulate a generic task scheduling problem in distributed systems,analyze the principles and shortcomings of existing scheduling algorithms,and identify the key factors affecting the performance of distributed systems.(2)Construct rigorous mathematical models for task scheduling,prove the formulated problem to be NP-complete,and design a novel task scheduling algorithm based on unsupervised learning.(3)Test and evaluate the proposed clustering-based scheduling algorithm in various real-life distributed computing environments and show the performance superiority in terms of task completion time over existing methods.
Keywords/Search Tags:distributed computing, scheduling, Hadoop, clustering, resource allocation
PDF Full Text Request
Related items