Font Size: a A A

Design And Implementation Of Highly Available Distributed Task Scheduling And Execution System

Posted on:2020-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2428330602950402Subject:Engineering
Abstract/Summary:PDF Full Text Request
This paper is based on a system project for the big data processing of Internet products that was developed during the postgraduate internship process.There are ten million users of Internet products and the number is still increasing steadily.In order to distinguish different users and interact with the target users to maintain user loyalty and stimulate new users' interest,it is necessary to process the relevant data of the existing full users,and filter out the target users and related information.Because all user-related data of the Internet product is stored in relational databases,the traditional method of processing data is to use a multi-threaded programming single-machine deployment program which has the problems of low execution efficiency and poor reusability.Another approach for this problem is to migrate the target data to a non-relational database.And then use the mature big data processing tools to process the relevant large-scale data.But it is very difficult to build a migration model which supports data integrity based on the current complex physical storage model.In order to deal with these problems,this paper combines the research of distributed technology and the actual business requirements to implement a high-availability distributed task scheduling and execution system based on Zookeeper.The system consists of a unified gateway module that interacts with the external environment,a task scheduling and distribution module that splits and distributes data processing tasks,a task execution module that performs data processing tasks,a high availability guarantee module that guarantees high availability of the system,and a log module.The system can receive various types of data processing tasks which the target data store in the relational database,and can satisfy tasks with different amount because of the design of a separate task scheduling and allocation module and scalable task execution modules.Consider of the importance of task allocation module and the need for multitasking,the high availability guarantee module of the system is designed and implemented.The system uses two machines to deploy task scheduling and allocation modules,one for working node and the other for standby node.It replaces the failed working node by the standby node automatically to achieve high availability of the system.The result of the complete functional test and performance test for this system shows that the high-availability distributed task scheduling and execution system implemented in this paper is in line with expectations.The execution efficiency of tasks with large amount data is much higher than that of traditional multi-thread programming single-machine deployment programs.And in theory the system can increase the task processing power of the entire system by increasing the task execution node.Finally,the system implemented in this paper has quite good business independence,related scalability,high availability and so on.
Keywords/Search Tags:Zookeeper, Distributed Systems, Big Data, Task execution, Task Assignment
PDF Full Text Request
Related items