Design And Implementation Of Scheduling Subsystem Of Big Data Processing Supporting Platform

Posted on:2016-07-18

Degree:Master

Type:Thesis

Country:China

Candidate:G H Ren

Full Text:PDF

GTID:2298330467993191

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology, the storage capacity and computing power of computer hardware have been greatly improved, which makes business save userâ€™s data with a lower cost, and do data mining and data analysis to derive more value. As a result, we entered the era of big data. In the era of big data, traditional databases appeare to be inadequate when facing with the massive data. How to efficiently store and analyze the massive amount of userâ€™s data becomes a difficult problem. In this situation, the big data processing supporting platform emerges that takes the policy of divide, distributedly stores and distributedly computes the massive data.In the big data processing supporting platform, the scheduling subsystem plays a very important role, which takes the responsibility of the scheduling and execution of the whole cluster computing task. Combined with the actual project requirements of mobile phone reading base in Zhejiang, this paper aims to enhance the intelligent scheduling capability of the platform and improve the scheduling efficiency of the overall task, through designing and implementing the scheduling subsystem of the big data processing supporting platform.After a long investigation from the actual project, it was found that there exists several problems in the following aspects in the big data processing supporting platform. On one hand, the web interface of scheduling system can only monitor workflow running status, but cannot configure workflows. On the other hand, not only writing configuration files but also using the graphical interface to conduct scheduling configuration becomes very difficult when there are a large number of scheduling tasks and the dependency relationships among them are complex. In addition, in order to reduce the occupancy of the cluster resource and complete the computing tasks faster, it needs to optimize the scheduling efficiency. Aiming at these problems, this paper designs and implements a graphical scheduling configuration subsystem, an automated scheduling configuration subsystem and a scheduling efficiency optimization subsystem. They respectively fulfil the functions as follows:provide a graphical interface for users to configure workflows and generate a corresponding configuration from the grapgical configuration result; automatically generate a configuration for users under the premise that users provide the basic scheduling information and configuration file templates; optimize the system overall scheduling efficiency and shorten the response time of whole scheduling tasks.This paper is organized as follows:the first chapter is an introduction, briefly describes the research background, research content, present research situation and research significance. The second chapter describes the requirements analysis and overall design of the scheduling subsystem. The third chapter focuses on the detailed design and implementation of the graphical scheduling configuration subsystem. The fourth chapter focuses on the detailed design and implementation of the automated scheduling configuration subsystem. The fifth chapter focuses on the detailed design and implementation of the scheduling efficiency optimization subsystem. The sixth chapter of the paper summarizes the reseach and work achievement, and puts forward the prospects for the development direction of the big data processing supporting platform.

Keywords/Search Tags:

business intelligence, big data, hadoop, scheduling subsystem

PDF Full Text Request

Related items

1	The Study Of Synthetic Analyses Subsystem Based On Business Intelligence Technology In P Company
2	Design And Implementation Of Background Data Scheduling Subsystem In Telecom Business Intelligence
3	Research And Application Of Hadoop In Business Intelligence
4	Business Intelligence Research In The Cloud Environment
5	Design And Implementation Of Data Warehouse Scheduling Subsystem In Traffic Management System
6	Reactive Scheduling For Online Analytical Processing Over Hadoop/HBase Cluster
7	The Design And Realization Of The Data Acquisition And Processing Subsystem Of CCPAS
8	Design And Implementation Of A Bank Business Intelligence Data Analysis Platform Based On Business Intelligence
9	Design And Implementation Of Business Intelligence Systems
10	Design And Implementation Of Call Center Business Intelligence System Based On Pentaho