Big Data Stream Query Framework And Research On Arithmetic Operator

Posted on:2017-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:C C Jiang

Full Text:PDF

GTID:2308330488997066

Subject:Information networks

Abstract/Summary:

PDF Full Text Request

Big data has brought some new challenges to the stream query system. Most of early stream query systems are centralized. When the memory is saturated, the node canâ€™t process the data efficiently, which will greatly affect the results. In the stream query system, join operation is a very important operation in data processing and many scholars have studied it. With the emergence of big data and cloud computing, it has brought some new problems to stream join operator. It is a new problem that how to distribute data which have connection relations to the same node and how to load shed in the same node. In the end, the excessive data will cause some pressure inevitably, if the system drop the data randomly when the system is overloading, it will reduce the result accuracy.This thesis research and analysis the above issues in depth, then this thesis propose a new real-time query framework which processes the continuous queries, and the frame redesigns the groupby operator, aggregation operator, join operator and other common operators based on storm.According to the research on the complex join operator, this thesis proposed an equivalent multiply stream join algorithm based on relational model which is named RMS-Join. Firstly, The algorithm reduces total cost of data distribution and data connection operation by computing the cost model of stream connection.Then, the algorithm set the primary key to keep the data needed to be connected in the same node.Finally, the tree model is connected with the load shedding data.Compared with the traditional multi-stream join algorithm, RMS-join algorithm can effectively improve the data connection operation efficiency and accuracy.In the end, this thesis presents a locally adaptive load shedding data method for large amount of data. When the system is overloaded,the method will calculate the DTW of the data streams before the data streams enter the system,and drop some data based on the overload condition and the data characteristics,it can save data characteristics as much as possibleand so that maximize the output accuracy degree.

Keywords/Search Tags:

Big Data, Stream Query, Stream Join, Load Shedding, Strom

PDF Full Text Request

Related items

1	Research Of Auto-adapted Load Shedding Algorithm On Data Stream Inquires Continuously
2	Research On Load Shedding Technology Based On Sliding Window Over Data Streams
3	Load Management Policies Of The Distributed Stream Processing System ARTs-SH
4	Research And Design On The Data Stream System
5	Research On Load Management Technology In Distributed Data Stream Processing
6	Research On Stream Join Algorithm And Parallelization Based On Big Data Platform
7	Efficient Similarity Join Over Probabilistic Data Streams Based On Earth Mover’s Distance
8	Stream Data Query Based On Feedback Mechanism
9	Research On Technologies Of Entire Time Network Data Monitor Based On Data Stream Management
10	Research On Multiple Aggregations Over Data Stream