Font Size: a A A

Research On Key Techniques For Top-k Query Processing Over Uncertain Data

Posted on:2020-02-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:H XuFull Text:PDF
GTID:1368330599461808Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Top-k query plays an important role in information retrieval,biomedical,multi-objective decision support,financial analysis and other fields.Due to the limitation of data acquisition equipment and the delay of network transmission,uncertain data has become ubiquitous in real life,which brings new challenges to data query.In addition,with the popularity of cloud computing,people pay more and more attention to data privacy protection,and need to make a careful balance between data security and availability.In view of the challenges faced by top-k query processing of uncertain data,the research work on these problems is carried out from the following three dimensions.Firstly,aiming at the problem of high query complexity caused by the diversity of query semantics on uncertain data,a formal definition of top-k query based on expected editing distance is given,and a new mapping distance of distance semantics probability n-gram set is proposed.Then,combining query semantics with the characteristics of probabilistic n-gram,a novel semantics called gram mapping distance of probability n-gram is proposed.Based on the gram mapping distance,an approximate matching filtering algorithm is proposed.The algorithm is further optimized by establishing a multi-level inverted index based on probabilistic n-gram and a frequency matrix.By analyzing the time complexity of the algorithm and evaluating the performance of a series of comparative experiments,it is proved that the algorithm achieves efficient processing of top-k queries of uncertain strings.Compared with the baseline algorithm,it has higher pruning filtering ability and better scalability.Secondly,aiming at the query performance problem of large-scale uncertain data query processing caused by the expansion of query candidate sets,a parallel top-k query processing algorithm of uncertain string based on MapReduce framework is proposed.Firstly,based on the mapping distance of probabilistic n-gram,a new lower bound of expected editing distance is proposed.By seamlessly combining this lower bound with threshold algorithm,the efficiency of filtering and pruning candidate items in Map stage is effectively improved.Thus,on the basis of guaranteeing the correct query rate,the computation cost of local nodes and the communication cost between Map nodes and Reduce nodes are greatly reduced.Through a series of comprehensive comparative experiments on real data sets and synthetic data sets,the results show that the speed of this algorithm is six times faster than that of the basic parallel algorithm in the best case,and the top-k query of uncertain strings on large-scale data sets has remarkable acceleration performance,and it has good scalability under different experimental parameters.Finally,aiming at the privacy security problems caused by data sharing in cloud environment,a top-k query processing algorithm for uncertain data based on differential privacy is proposed.Firstly,a formal definition of top-k query problem for multi-dimensional uncertain data based on differential privacy is proposed.Then,a R-tree index satisfying differential privacy is established locally,and a content addressable network is formed between servers to construct a distributed two-level index.Finally,theoretical analysis and experimental evaluation are carried out to verify the proposed method.The algorithm satisfies ?-differential privacy and has good scalability.Under the same principle of differential privacy,the data availability of the algorithm is improved by about 20% compared with the existing method Quad-opt.In summary,the top-k query processing over uncertain data is studied from different dimensions,such as query filtering performance and user privacy data security,in order to form a series of top-k query processing optimization algorithms for uncertain data,which can effectively improve the query performance of various uncertain data in multiple practical application scenarios.
Keywords/Search Tags:Top-k Query, Uncertain Data, MapReduce, Differential Privacy
PDF Full Text Request
Related items