Font Size: a A A

The Research Of Key Techniques Of Uncertain TOP-K Query Processing

Posted on:2018-02-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Q XiaoFull Text:PDF
GTID:1318330542983713Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the progress of human society and the development of network technology,data in-formation has been a significant strategy resource as well as substance and energy.In this era of information explosion,it is urgent to extract the key information from the big data sets and pro-vide decision support for customers.Top-k query,as an important data management operator,plays a significant role in multi-criteria decision support,market analysis,environmental mon-itoring and so on.In addition,uncertain data exists extensively in may areas,such as wireless sensor network(WSN),location-based services(LBS),radio frequency identification(RFID),web service,etc.This is due to the limitations of data collection equipment,requirements of privacy protection or network transmission delay and so on.Uncertain data management has become an important research hotspot in databases.Although there exist many works on tra-ditional top-em k queries for deterministic data sets,they can not be directly applied to deal with uncertain data.Compared with traditional top-em k queries,the calculation mode of the uncertain top-k query is more complex,the data type is more diverse,and it needs to adapt to different application scenarios.In this dissertation,the key techniques of uncertain top-k query are researched,and the main contributions are as follows:(1)Propose the top-k range query processing algorithm for uncertain databases.The top-k range reporting problem has been studied in various practical applications.Although the traditional top-k range query based on deterministic data sets has been well studied,the solution can not be applied directly to the processing of uncertain data due to the intrinsic differences between unccrtain and certain data.Motivated by this,this paper first elaborate a new and important query in the context of an uncertain database,namely uncertain top-(k,l)range(UTR)query,which retrieves l uncertain tuples that are expected to meet a score range constraint and have the maximum top-k probabilities but no less than a user-specified probability threshold.In order to enable the UTR query answer faster,some effective pruning rules are presented to reduce the UTR query space,which are integrated into an efficient UTR query procedure.What's more,to improve the efficiency and effectiveness of the UTR query,a parallel UTR(PUTR)query algorithm based on OpenMP is presented.Extensive experiments have verified the efficiency and effectiveness of our proposed algorithms.It is worth to notice that,the UTR query can capture those important tuples missed by the PT-k query and Global-Topk query.Additionally,comparing to the UTR query,the PUTR query performs much more efficiently and effectively with the super linear speedup.(2)Present the probabilistic reverse top-k query processing algorithm for uncertain databas-es.Top-k queries have been mainly studied from the perspective of the user,focusing primarily on efficient query processing.Reverse top-k queries are proposed from the perspective of the product manufacturer,which are essential for manufacturers to assess the potential market and impact of their products based on the competition.This paper models firstly probabilistic re-verse top-k queries over uncertain big data for the discrete situation,in both monochromatic and bichromatic cases,denoted by MPRT and BPRT queries,respectively.This paper deter-mines the partitions of solution space of MPRT queries and provides in theory a mathematical model for solving arbitrary dimensional data space.Additionally,effective pruning heuristics are proposed to reduce the search space of BPRT queries.Moreover,efficient query procedures are presented seamlessly with integration of the proposed pruning strategies.Extensive experi-ments using both real-world and synthetic data sets demonstrate the efficiency and extendibility of our proposed approaches with various experimental settings.(3)Propose the probabilistic top-l influential query processing algorithm for uncertain databases.Based on PRT queries,this papers formulates a probabilistic top-l influential query,that reports the l most influential objects having the largest impact factors,where the impact factor of an object is defined as the cardinality of its probabilistic reverse top-k query result set.Effective pruning heuristics are presented for speeding up the queries.Particularly,this pa-per exploits several properties of probabilistic threshold top-k queries and probabilistic skyline queries to reduce the search space of this problem.In addition,an upper bound of the potential users is estimated to reduce the cost of computing the probabilistic reverse top-k queries for the candidate objects.Finally,efficient query algorithms are presented seamlessly with integra-tion of the proposed pruning strategies.Theoretical analysis and extensive experiments using both real-world and synthetic data sets demonstrate the efficiency and effectiveness of proposed algorithms.(4)Discuss the queue analysis of continuous query processing for uncertain data stream-s.With the rapid development of data collection methods and their practical applications,the management of uncertain data streams has drawn wide attention in both academia and indus-try.System capacity planning and Quality of service(QoS)metrics are two very important problems for data stream management systems(DSMSs)to process streams efficiently due to unpredictable input characteristics and limited memory resource in the system.Motivated by this,in this paper,we explore an effective approach to estimate the memory requirement,data loss ratio,and tuple latency of continuous queries for uncertain data streams over sliding win-dows in a DSMS.More specifically,we propose a queueing model to address these problems in this paper.We study the average number of tuples,average tuple latency in the queue,and the distribution of the number of tuples and tuple latency in the queue under the Poisson arrival of input data streams in our queueing model.Furthermore,we also determine the maximum capacity of the queueing system based on the data loss ratio.The solutions for the above prob-lems are very important to help researchers design,manage,and optimize a DSMS,including allocating buffer needed for a queue and admitting a continuous uncertain query to the system without violation of the pre-specified QoS requirements.The proposed algorithms and methods in this dissertation not only have certain theoretical value,enrich the research contents of data management,but also promote the practical process of uncertain data management,which have great application value and practical significance.
Keywords/Search Tags:Uncertain Data, Query Processing, Top-k Queries, Range Queries, Reverse Top-k Queries, Top-l Influential Queries
PDF Full Text Request
Related items