Font Size: a A A

Design And Implementation Of SQL-Like Tool Processing Real-Time Data Streams Based On Storm

Posted on:2019-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:J F KongFull Text:PDF
GTID:2428330542996824Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the real-time streaming data query,the lack of a mature SQL program,mainly rely on programming language to complete the query of streaming data.The StreamSQL system provides a continuous query solution for real-time data streams.On the basis of a stream data computing platform,business people can use a SQL-like language to query streaming data services in a unified and simple method.In terms of functions,the system provides basic SQL functions such as filtering and transformation,and also introduces functions such as calculation,aggregation,association,and merging based on the data flow window.First of all,I design StreamSQL syntax according to the characteristics of real-time data.Compared to off-line data,stream data contains the dual properties of the tuple and time.Any element of the stream can use Element<tuple,time>to represent.Tuple contains the data structure and data content,and time is the logical time of the tuple.According to the continuous flow of data streams,there is no concept of tables in StreamSQL.There is concept of streaming in StreamSQL.Whether it is a query or a new operation,it is implemented on the basis of flow.According to the attributes of the data flow time,the system complete the query operation such as aggregation or association.it is necessary to aggregate a piece of data within a certain period of time to construct the view and then process it.Therefore,a unique window concept is added to the StreamSQL syntax.Window is an important means of solving the data flow's borderlessness and mobility in stream processing.It changes the data stream at a certain time or many data tuples into a static view so as to query date in a variety of similar database tables.In streaming data,there are two types of windows including time windows and recording windows.The former generates overdue events in units of time,and the latter generates overdue events in units of records.The Window handles the data in the form of a batch window.At a given time in the window or in the record range,batches of overdue data in the range are generated,forming a view to facilitate the next step of query processing.According to the query needs,it provides simple data flow filtering,aggregation,and association operations.The result of the query is also the data stream,which is to be written into a message queue or persisted to the file system.Second,it is in terms of system design.In order to develop the division of labor of the developers and expand to more real-time computing frameworks.The system is divided into three decoupled modules.The first is the SQL-parsing module,which uses ANTLR tools to complete the translation of SQL statements into Java objects,realizing domain language to programming language and parsing the execution plan.The second is the Stream-operator module,which abstracts the data calculation logic and does not depend on the specific real-time computing framework.It completes the input and output and function operator units according to the processing characteristics of the streaming data.The third is the Storm-assembly module,which encapsulates the programming interface provided by the Storm computing platform.According to the operator's need for the execution plan,it inject the Stream operator instance when creating the spout and bolt instances and assemble the Strom topology program.In addition,liking JDBC drivers provided by the relational database,the last part of the StreamSQL system is the JDBC driver module,which is designed with the server client.The client provides the JDBC driver for the Java developer.Finally,it is in terms of system usage.The real-time computing framework adapts to the Storm cluster.In the production environment,StreamSQL cooperates with the Strom cluster to act as a consumer and producer of Kafka clusters.It can also persist the data stream to the HDFS file system and process the message queue more flexibly and conveniently.Non-program developers,such as business people or domain experts,can do real-time streaming data query work by simply training StreamSQL syntax.Program developers can also use the StreamSQL system to process simple data flow and avoid extensive repetitive development tasks.Upstream developers can use the StreamSQL system in their projects by using the JBDC driver.
Keywords/Search Tags:real-time streaming data, SQL, Continuous query
PDF Full Text Request
Related items