Font Size: a A A

Top-k Complex Event Query Technology For Data Streams

Posted on:2019-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:G D ChenFull Text:PDF
GTID:2428330545473839Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet industry has brought about the explosive growth in the scale of data,while also making Big Data more and more obvious in terms of streaming feature.The data stream brings a huge challenge to the traditional database with its unique real-time,burstiness,volatility and infinity.Complex event processing can make use of various associations between event attributes,continuously filtering through the massive data streams that arrive consecutively through various matching rules or algebraic operations,finding out event sequences that meet certain correlation constraints,and providing fast response.Complex event processing technology can meet the needs of high throughput and low latency in massive data processing,which is one of the key technologies in data stream processing.With the increase of the number of event sources and the number of events to be processed,the relationship between events is becoming more and more complex.Under the conditions of single-pass data scan and limited memory,dynamic analysis techniques and high-efficiency that can adapt to the characteristics of fast and time-varying data flow are studied.The mining algorithm has become a hot issue in data flow research.By extracting an event sequence that conforms to a specific pattern and detecting it in real time,the high throughput and low latency requirements in mass data processing can be satisfied.Under the condition of single-pass data scanning and limited memory,it has become a hot research problem to study the dynamic analysis technology and mining algorithm which can adapt to the characteristics of fast and time-varying data stream.Researchers have successively proposed data stream Top-k query algorithms for different application scenarios.However,the existing research results are still not perfect.Most of these algorithms are sensitive to data stream distribution and parameter change.In this paper,the dynamic adaptive partition Top-k continuous query and data stream Top-k dominated query are studied and discussed respectively.We summarize our contributions as follows.(1)Top-K continuous query algorithm based on dynamic adaptive partitioning technology.Because of the fast,continuous,infinite,and other characteristics of the data stream,a sliding window is used to deal with the continuous query of the data stream.Firstly,the equal partition strategy is used to partition the window into several disjoint sub-windows,Equal partitioning methods cause unnecessary maintenance costs when maintaining candidate sets.Aiming at this problem,a dynamic adaptive partitioning algorithm is further proposed.This algorithm can adaptively adjust the size of the partition according to the distribution of the data stream and detect whether the size of the partition is appropriate by the Mann-Whitney rank sum test.Then the global filtering and local filtering are used to filter out the objects which have no contribution to the final result set in advance,reducing the communication costs.Finally,the feasibility and efficiency of the algorithm are verified by experiments with real data sets.(2)Top-k dominating query algorithm for distributed data streamsAiming at the problems of the traditional Top-k query score function is difficult to specify and the size of the skyline query result set is not easy to control,this paper proposes Top-k dominating query algorithm for data stream.Top-k dominating queries are powerful queries that combine the advantages of both Top-k and skyline queries.More specifically,the Top-k dominating queries require no ranking functions of users and can control the size of the result set,therefore play an important role in decision support and other fields.In this paper,the distributed query framework of SparkStreaming+HDFS is adopted,and Filter-based Top-k dominating query algorithm is proposed.The non-k-skyband objects are efficiently filtered by using the subspace skyline and SKYBT technology to achieve the purpose of pruning in advance and the performance of the algorithm is improved.Finally,the performance of the algorithm in time and space is verified by real data sets.
Keywords/Search Tags:Data streams, Complex event processing, Top-k continuous queries, Top-k dominating queries, Dynamic self-adaptive partition, k-skyband
PDF Full Text Request
Related items