Font Size: a A A

Research On Task Scheduling Mechanism Of Storm-based Data Processing System

Posted on:2019-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaiFull Text:PDF
GTID:2370330590465519Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As the astronomical observation data is continuously generated in the scattered astronomical observatories,the metadata processing system must have high real-time performance.However,the traditional batch processing big data platform needs to download,store,and process the data,so it is difficult to meet the system's requirements.High real-time requirements.As a distributed real-time computing framework for streaming data,Storm can guarantee the timeliness of large-volume data processing.Therefore,this paper considers applying it to the astronomical metadata processing system to realize real-time data stream receiving and processing.One of the core problems that affect the performance of the Storm system through analysis is the system task scheduling mechanism.At present,the default scheduling mechanism used by the Storm system is a simple polling mechanism.Although a certain load balance can be achieved,the more complex task requirements for processing astronomical metadata in real time will generate a large system performance bottleneck.For the Storm system scheduling problem,this paper mainly implements optimization from two parts: the first aspect is to increase the system elasticity mechanism to improve the system resource utilization;the second part is to reduce the system communication overhead by optimizing task deployment.Firstly,aiming at the lack of the elasticity mechanism of Storm system,the paper proposes to embed a real-time adaptive elastic mechanism module in the system,and continuously obtain the status information of the system operation,and make corresponding scheduling decisions based on the information,dynamically for each Topology.The application of a reasonable allocation of computing resources allows the system's resources to be more fully utilized.Then,for the problems of unreasonable deployment of the current tasks and causing excessive system communication overhead,the paper proposes a task scheduling optimization based on graph partitioning technology method.The specific approach is to treat the Topology computation application in operation as a graph with weights,and then use the graph division technique to process this graph to obtain a reasonable task deployment and optimization scheduling scheme.The scheduling scheme can effectively reduce the communication overhead of the system under thecondition of ensuring load balancing.Finally,the scheduling plan is submitted to the system for implementation of scheduling,to achieve the goal of reducing the system processing delay and improving the system throughput.The paper implements the proposed scheduling optimization scheme and builds an experimental environment.The function and performance of the system are thoroughly tested.The experimental results show that the scheduling optimization scheme designed by the dissertation can improve the performance of resource utilization,system processing delay and throughput.The research and implementation of this scheduling scheme has effectively improved the performance of the Storm system and provided key technical support for the real-time processing of astronomical metadata.
Keywords/Search Tags:Big data, task scheduling optimization, real-time processing, Streaming data processing
PDF Full Text Request
Related items