Font Size: a A A

The Design And Implementation Of Data Warehouse Job Scheduling System In The Big Data Environment

Posted on:2020-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2518305771956179Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the era of big data,enterprise data shows characteristics of enormous scale,increased types,and complexity in relationships.Affected by this,the enterprise data warehouse also has problems in the construction process,such as untimely job processing,inability to process new data types and mismatch between job logic and various business scenarios,which make it difficult to meet the needs of enterprise development.In this thesis,the problems faced by the data warehouse job scheduling system in the big data environment are studied and analyzed.A type of job scheduling system design scheme is proposed,and the implementations of key functions are given.The solution adopts a distributed architecture overall and has a high concurrency design in multiple components,which improves the load capacity of the system in the face of massive data processing.At the same time,the scheme abstracts the basic objects in the field of data operations and hides the differences caused by different types,which improves the system's ability to manage multiple types of data operations and expand on new types of data operations.Finally,the solution abstracts and designs the job constraints for business characteristics,and introduces the idea of workflow management so that the job can run quickly and efficiently while satisfying constraints.The thesis analyzes the functional and non-functional requirements of the system in detail and divides the system into data operation development,workflow engine,tenant authority management,execution cluster management,execution node,and other modules.Based on this,the detailed implementations of the overall architecture and some key modules are given.The system uses the Spring Boot framework for rapid setup.Different modules are independently developed and deployed.The modules use a RESTful interface for communication.The user connects to the development side module of the system through a web browser to obtain information or convey instructions.After testing and actual operation feedback,the system can sufficiently meet the development and scheduling of data operations in the big data environment and improve the efficiency of developers.
Keywords/Search Tags:Big data, Data warehouse, Job scheduling, Workflow
PDF Full Text Request
Related items