Font Size: a A A

Design And Implementation Of A Stream Data Preprocessing And Service System

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:C DiFull Text:PDF
GTID:2518306494471064Subject:Master of Engineering - Computer Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,information generated in various fields is growing rapidly,and some of these data have characteristics such as large data volume,real-time arrival,continuous uninterrupted,and rapid changes,which need to be processed in real time and are called stream data.Real-time processing and data mining of stream data to extract the valuable parts of it will bring new impetus and impact on industry and social informatization.In the whole process,the work related to data preprocessing occupies a lot of time,and the quality of data directly determines the prediction and generalization ability of the model.The traditional stream data pre-processing model is that users call the application program interface provided inside the stream processing framework,write a stream processing program for a specific business scenario,and then upload the program to the resource cluster to run on their own initiative.This stream data pre-processing model has some drawbacks.First,a large number of open source stream processing framework and other application systems are more independent and fragmented,without a set of stream data access to real-time pre-processing to processing results of the service process,configuration framework and writing procedures are more cumbersome,so that users can not focus on the business logic itself.At the same time,there are a lot of similar tasks in the data pre-processing process for different sources of stream data,and it is labor-intensive to create and deploy tasks repeatedly through the code.Based on the above problems,this paper designs and implements a stream data pre-processing and service-oriented system with the following main work:1.To address the problems of high learning cost and long and complex development process of stream processing tasks,the system is based on the open source stream processing framework Flink,which encapsulates the operations of real-time access to stream data,stream data pre-processing and serviceability of processing results into a visualization module,so that users can quickly and naturally create a stream data pre-processing and serviceability task without writing code,just by combining the connection to the visualization module and configuring relevant parameters.This significantly reduces the development threshold and improves development efficiency.2.To address the problem of repeatedly creating and deploying stream data preprocessing tasks through code,the system has designed and implemented a set of process parsing and mapping framework at the bottom.The framework not only supports converting user-configured stream data preprocessing and servitization tasks into executable Flink code,but also supports developers to add new preprocessing operations to the system,and the added preprocessing operations are also directly provided to users in the form of visual modules,which improves the flexibility and scalability of the system.3.The test results show that the system is efficient,flexible and configurable compared with traditional stream data processing systems,and has more advantages in terms of ease of use and high availability.
Keywords/Search Tags:stream data, data preprocessing, visualization process configuration, service
PDF Full Text Request
Related items