Font Size: a A A

Design And Implementation Of Scheduling Subsystem Of Big Data Processing Supporting Platform

Posted on:2016-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:G H RenFull Text:PDF
GTID:2298330467993191Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology, the storage capacity and computing power of computer hardware have been greatly improved, which makes business save user’s data with a lower cost, and do data mining and data analysis to derive more value. As a result, we entered the era of big data. In the era of big data, traditional databases appeare to be inadequate when facing with the massive data. How to efficiently store and analyze the massive amount of user’s data becomes a difficult problem. In this situation, the big data processing supporting platform emerges that takes the policy of divide, distributedly stores and distributedly computes the massive data.In the big data processing supporting platform, the scheduling subsystem plays a very important role, which takes the responsibility of the scheduling and execution of the whole cluster computing task. Combined with the actual project requirements of mobile phone reading base in Zhejiang, this paper aims to enhance the intelligent scheduling capability of the platform and improve the scheduling efficiency of the overall task, through designing and implementing the scheduling subsystem of the big data processing supporting platform.After a long investigation from the actual project, it was found that there exists several problems in the following aspects in the big data processing supporting platform. On one hand, the web interface of scheduling system can only monitor workflow running status, but cannot configure workflows. On the other hand, not only writing configuration files but also using the graphical interface to conduct scheduling configuration becomes very difficult when there are a large number of scheduling tasks and the dependency relationships among them are complex. In addition, in order to reduce the occupancy of the cluster resource and complete the computing tasks faster, it needs to optimize the scheduling efficiency. Aiming at these problems, this paper designs and implements a graphical scheduling configuration subsystem, an automated scheduling configuration subsystem and a scheduling efficiency optimization subsystem. They respectively fulfil the functions as follows:provide a graphical interface for users to configure workflows and generate a corresponding configuration from the grapgical configuration result; automatically generate a configuration for users under the premise that users provide the basic scheduling information and configuration file templates; optimize the system overall scheduling efficiency and shorten the response time of whole scheduling tasks.This paper is organized as follows:the first chapter is an introduction, briefly describes the research background, research content, present research situation and research significance. The second chapter describes the requirements analysis and overall design of the scheduling subsystem. The third chapter focuses on the detailed design and implementation of the graphical scheduling configuration subsystem. The fourth chapter focuses on the detailed design and implementation of the automated scheduling configuration subsystem. The fifth chapter focuses on the detailed design and implementation of the scheduling efficiency optimization subsystem. The sixth chapter of the paper summarizes the reseach and work achievement, and puts forward the prospects for the development direction of the big data processing supporting platform.
Keywords/Search Tags:business intelligence, big data, hadoop, scheduling subsystem
PDF Full Text Request
Related items