Font Size: a A A

The Design And Implementation Of A Task Scheduling And Monitoring Platform For Big Data Offline Applications

Posted on:2022-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:C S GengFull Text:PDF
GTID:2518306338467324Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of the Internet technologies in recent years,the amount of data accumulated by the company that I am serving has grown exponentially and the dependence on big data technologies for this company has become stronger and stronger.Building a big data offline applications scheduling platform to provide an integrated solution for data extracting,data analyzing,data outputting and data querying and in the mean time trying to reduce data management costs and aggregate data to provide a basic data querying environment for creating a more complete data profile so that the employees in this company can get the tool to deeply explore the potential value of data has become a rigid demand for the company.but what are the capabilities for a platform need to have so that it can meet the need that we have mentioned above?A complete big data offline applications scheduling platform will not only need to strong power of data computing but also the abilities of applications scheduling,applications monitoring and the data post-processing to provide the data created by the application to it's down streams.The paper aims to implement a set of applications publishing and scheduling platform for big data offline processing applications based on Kubernetes container cluster management technology,so as to provide a set of corresponding solutions for the needs of the enterprise for the data processing platform.The main work done in this paper is:(1)Design and implement the platform of a big data offline applications scheduling platform.This paper completed the design and development of the platform for offline applications tasks release,tasks scheduling and processing,tasks monitoring,tasks' output data delivery to down streams and other functions on the basis of project layering.The layered design of the platform architecture makes the functional modules decoupled from each other,pluggable and easy to maintain.(2)Design and implement a way to automatic manage the table in Presto and tables'partitions registration in it.Based on the analysis for the shortcomings and pain points,which is a large amount of workers' time is invested to making the current Presto tables management and tables' partitions registration work fine,of the current way to manage the Presto tables in the company,we try to providing a new way to manage these table and partitions registration by using HTTP technology in this paper,after which we try to provide an idea of creating an automatic table management and table partition registration subsystem to achieve the new way.And we also use a lots of words to detailedly describe the architecture diagram of this subsystem and the division of functional modules in it.Of course the design and implementation of these core functional modules in the subsystem are described more detailedly in this paper.
Keywords/Search Tags:big data offline applications, offline applications' scheduling platform, Presto table partition registration management, multiple applications scheduling queues, timed tasks release with Airflow
PDF Full Text Request
Related items