Font Size: a A A

The Research And Implementation Of Data Stream Processing And Analysis Engine Based On DAG

Posted on:2017-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2348330566456726Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays,the amount of data continues to enhance and promote the development of Big Data technologies,therefore,big data analysis has become an important branch of software engineering.With the development of messaging middleware technology,there is technical support about accessing to real-time data and analysis.In this thesis,the existing data processing platform used by CAS can not meet the demand for real-time data processing.Compared the advantages and disadvantages of existing systems at home and abroad,a based on DAG stream data processing and analysis engine is proposed.The engine can not only make use of existing computing resources flexibly and efficiently handle massive amounts of data in real time,but also has good scalability to meet the needs of the Chinese Academy of real-time data processing tasks.This thesis analyzes the current demand for real-time data processing,work in theory are as follows:(1)An operator model is presented.The design of Operator model is based on the abstraction of existing system data operating,which ensures a high degree of unity with the business scenario,through ease of use and reusability.(2)A task scheduling algorithm is designed.The design of task scheduling algorithm comes from the internal logic of distributed computing framework,which ensures the reliable and efficient data processing.(3)Asynchronous communication mechanism is introduced.Asynchronous communication mechanism is based on the actual situation on the improvement of synchronous communication mechanism,which ensuris communication efficiency and the release of resource utilization as the same time.With the deign of interfaces,it complets high cohesion and low coupling among the various parts.DAG-based stream processing and data analysis engine not only provides physical support for the realization of the theoretical model,but also the necessary means to verify the correctness of the model.On this basis,the following is design features.(1)A control engine is degined.A friendly graphical user interface is provided by controlling engine for users,through the operation of the database function package to ensure control engine is easy to use.(2)A scheduler engine is degined.Scheduling engine provides users with the underlying distributed computing framework for the selection and scheduling.(3)An executor engine is degined.The executor engine provides users to submit to the distributed computing framework for computing functions,with the underlying distributed computing framework bound to reduce the coupling underlying among the distributed computing framework,the scheduler engine and the operator models.The executor engine is pluggable,through the different distributed computing framework for adaptation,the ability to submit their calculations is completed,additionally,in accordance with the actual situation it corresponds to deployment.In this thesis,the model and engine systems are verified to solve the problems encountered by each module in the development process.Furthermore,the correctness of the operator model and the availability of the system is validated by this thesis.This engine not only solves the problem of Streaming data to calculate and provides a reliable stream after examples of engineering calculation in engineering practice,but also provides a reference model and algorithm for the study of flow calculation in theory.
Keywords/Search Tags:Spark Streaming, DAG stream processing, Flow calculation engine
PDF Full Text Request
Related items