Font Size: a A A

Big Data Stream Query Framework And Research On Arithmetic Operator

Posted on:2017-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:C C JiangFull Text:PDF
GTID:2308330488997066Subject:Information networks
Abstract/Summary:PDF Full Text Request
Big data has brought some new challenges to the stream query system. Most of early stream query systems are centralized. When the memory is saturated, the node can’t process the data efficiently, which will greatly affect the results. In the stream query system, join operation is a very important operation in data processing and many scholars have studied it. With the emergence of big data and cloud computing, it has brought some new problems to stream join operator. It is a new problem that how to distribute data which have connection relations to the same node and how to load shed in the same node. In the end, the excessive data will cause some pressure inevitably, if the system drop the data randomly when the system is overloading, it will reduce the result accuracy.This thesis research and analysis the above issues in depth, then this thesis propose a new real-time query framework which processes the continuous queries, and the frame redesigns the groupby operator, aggregation operator, join operator and other common operators based on storm.According to the research on the complex join operator, this thesis proposed an equivalent multiply stream join algorithm based on relational model which is named RMS-Join. Firstly, The algorithm reduces total cost of data distribution and data connection operation by computing the cost model of stream connection.Then, the algorithm set the primary key to keep the data needed to be connected in the same node.Finally, the tree model is connected with the load shedding data.Compared with the traditional multi-stream join algorithm, RMS-join algorithm can effectively improve the data connection operation efficiency and accuracy.In the end, this thesis presents a locally adaptive load shedding data method for large amount of data. When the system is overloaded,the method will calculate the DTW of the data streams before the data streams enter the system,and drop some data based on the overload condition and the data characteristics,it can save data characteristics as much as possibleand so that maximize the output accuracy degree.
Keywords/Search Tags:Big Data, Stream Query, Stream Join, Load Shedding, Strom
PDF Full Text Request
Related items