Font Size: a A A

Top-k Query Processing Techniques Over Streaming Data

Posted on:2017-10-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:R ZhuFull Text:PDF
GTID:1368330572465441Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,data stream has become one of the most fundamental type of data nowadays.Compared with traditional data types,data stream is characterized by its fast generation and transmission speed,as well as unknown data distribution.All of these pose tremendous challenges to efficient queries over stream data and real-time management.Top-k:query is one type of important query over data stream.It is widely used in the domain of financial analysis,public opinion monitoring,and so on.Many researchers have studied this problem.However,existing algorithms commonly suffer low efficiency,and are sensitive to data distribution and temporal relation.In many cases,these algorithms fail to efficiently support top-k queries.In this thesis,we deeply study the problem of top-k query over streaming data,and propose a group of novel algorithms to support this types of queries under streaming data environments.The contributions of this thesis are as follows:(1)We first study the problem of continuous top-k:query over sliding window.Aiming at this problem,we propose a novel framework named SAP(Self-Adaptive Partition).Based on SAP,we propose an equal partition algorithm which could reduce the incremental maintenance cost to logarithmic complexity.In addition,we further propose two dynamic partition algorithms,which could self adaptively partition the window so as to cater the challging of data distribution.One benefit is that we could use fewer candidates to support the query.Lastly,we verify the effectiveness of the proposed algorithms through a series of experiments.(2)We then study the problem of continuous top-k query over un-ordered streaming data.To tackle this problem,we propose a novel framework named GSTopK(Group Stack:TopK.We first propose two filter algorithms,which could efficiently prune newly arriving stream data under different data distribution.In addition,we propose a novel data structure,which is used for indexing candidates,i.e.,the ones cannot be pruned.Based on this structure,we propose two algorithms for candidate maintenance.Finally,we verify the effectiveness of our proposed algorithms through a series of experiments.(3)Next,we study the problem of k-range query over streaming data.Since the scale of streaming data is large,we adopt a sampling method that only keeps a small size of probabilistic objects so as to reduce the data scale.In order to index such probabilistic objects,we propose a novel index named HGD-Tree.Based on HGD-Tree,we propose a group of novel strategies to efficiently handle newly arrival objects.In this way,we could efficiently apply the insertion,deletion,and updating on the index.In addition,we propose a self-adptive character extraction algorithm,in which the extracted characters can provide objects with tight probabilistic boundary.We then propose a novel bit-operation based algorithm to support the k-range query.Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms.(4)Last but not the least,we study the problem of continuous top-k query over streaming data with limited memory.We define a novel query named(?,?)--approximate continuous top-k query,which returns approximate answers for top-k query.We firstly propose three pruning algorithms which are used for filtering newly arriving objects.For those objects that are not filtered,we propose an algorithm to compress them if their scores are less the threshold 8.We could make sure that the space cost of our proposed algorithms are unrelated with the window size.Finally,we evaluate the efficiency of our proposed algorithms with extensive experiments.To summarize,aiming at overcoming the disadvatages of the existing algorithms of top-k query over data stream,we study some typical problems,such as continuous top-k query over sliding window,continuous top-k query over disorded streaming data,k-range query in data steam,approaximate continuous top-k query.The study results can make the bases for efficiently processing top-k query over stream data.
Keywords/Search Tags:silding window, continuous top-k query, unorder streaming data, k-range query, approximate continuous top-k query
PDF Full Text Request
Related items