Font Size: a A A

Design And Implementation Of Big Data Application Scheduling System

Posted on:2020-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:M G HeFull Text:PDF
GTID:2428330578457159Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people have entered the era of big data and mobile Internet.Big data application is the use of data value,that is,through data analysis to extract effective information from massive data,to provide users with decision support[6].How to efficiently process these data becomes the key,and the scheduling between tasks in data processing is of great significance to the overall performance and resource utilization.This paper is a big data application scheduling system based on the actual needs of X company combined with the current open source azkaban scheduling system.This paper firstly analyzes the company's main requirements in task scheduling through the actual business needs of the company,and conducts technical research on several major open source scheduling systems currently on the market,and proposes a suitable implementation scheme to determine the selection of the scheduling system.type.Secondly,through the analysis of the demand and the research and analysis of the technology,the design and development based on the open source Azkaban scheduling system is finally selected.The system adopts the micro-service technology architecture to divide the system into the web management part,the dispatcher part and the actuator part.Three parts.The system uses cluster deployment to achieve high availability of scheduling.The web management part adopts SSM(Spring&SpringMVC&MyBatis)architecture to realize the web interface operation of the management side.At the same time,the graphical editing interface IDE not only realizes the workflow of the scheduling system.The core logic of scheduling also supports drag and drop editing on complex DAG workflow lines.In the executive part,Azkaban's plug-in mechanism can be used to support job plug-ins in different scenarios to implement workflow scheduling for different task types.In addition,the containerization technology Docker is used to isolate the workflow execution environment and avoid the interaction between workflows in different environments.Finally,the system is optimized with the internal use of the company to make it an efficient big data application scheduling system.At present,the big data application scheduling system mentioned in this article has been running normally in the production environment.According to the actual online operation effect,the dispatching system can complete the daily business needs of the company and support high concurrent scheduling of tens of thousands of tasks.The post-system will also be continuously optimized and optimized,and iteratively upgraded to become a core big data application product of the company.
Keywords/Search Tags:big data application, dispatch system, Azkaban
PDF Full Text Request
Related items