Font Size: a A A

Optimization Of Spatial-textual Queries

Posted on:2022-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:R YangFull Text:PDF
GTID:1488306542473904Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of location-aware devices and technologies,and the prevalence of location-based applications,spatial-textual data with both spatial and textual attributes is being generated at an unprecedented speed and scale.These data is also called spatial-textual object(abbreviated as object).Spatial-textual queries(STQ for short)retrieve the high-precision results satisfying the spatial and textual constraints on these spatial-textual objects,and are the core operations of numerous location-based services.Evaluation and optimization of spatial-textual queries is a major research branch in the field of spatial data management.Spatial-textual indexes and data reduction techniques are two major optimization approaches to eliminating a large number of unpromising objects and reducing the number of objects to be verified in order to improve the efficiency of query evaluation.For the STQ optimization approach utilizing the spatial-textual indexes,it is necessary to select the appropriate spatial and textual indexes,and construct the efficient mechanisms that map the spatial and textual attributes to the indexes to improve the index filtering performance,reduce the number of objects to be verified,and avoid high verification costs.This approach is applicable to those queries whose spatial and textual attributes can be mapped to the indexes,while with the major problem that the stata-of-the art spatial-textual indexes are lack of effciency and adaptability,which resulting in the update cost is prohibitively high if the spatial-textual data frequently updates.The reason is that they usually are constructed based on static datasets,and inherently have strong filtering performance but high update costs.To solve this problem,this thesis takes as an example the typical continuous k-nearest neighbor queries over spatial-textual data streams(Ck QST for short)where the index needs to be updated frequently,and studies the STQ optimization by constructing a self-adaptive model as the core mapping mechanism of spatial-textual indexes to balance the filtering ability and update cost of the index.For the STQ optimization approach utilizing the data reduction techniques,the spatial or textual attributes of queries usually could not be mapped to the index,resulting in the filtering ability indexes cannot be fully utilized and the size of data to be verified is large.The data reduction techniques group the data,and select some representative data satisfying the query constraints in each group,and filter the other data to avoid costly verification at the cost of a small loss of precision.This approach is applicable to those queries whose spatial or textual attributes cannot be mapped to the index.It has an associated problem that the performance varies greatly against different datasets,which cannot guarantee the filtering capability and accuracy of the results and lacks universality.To solve the problem,this thesis takes as an example the typical NP-Hard keyword-aware optimal route queries(KORQ)whose spatial attributes cannot be mapped to the index,constructs a model describing the relationship between the filtering capability and accuracy of the results,develops the data reduction techniques with adjustable query performance to meet the varied query response time and accuracy requirements.The main contributions follow.(1)To evaluate Ck QST on the dynamic data stream,a spatial and textual index mapping mechanisms are proposed.First,a verification and update of memory-based cost model(VUMBCM for short)is proposed,which calculates the optimal mapping nodes corresponding to the spatial search range of the queries,and balances the verification cost of the queries and update cost of the index.Second,a basic block based ordered inverted index is poprosed to determine the number of keywords for constructing the posting list in inverted index,shortenes the length of the original posting list,and can quickly locate the queries to be verified.Finally,a batch mapping strategy is proposed to improve the data throughput,which maps multiple objects containing common keywords to the corresponding blocks of the posting list,and processes the objects in a batch with a shared scan.To evaluate Ck QST,the index integrates Quadtree with VUMBCM,and the basic ordered inverted index,and is called OIQ-tree.The average incoming object processing time for OIQ-tree decreases by 22%,and the average index updating time caused by object updating decreases by 46%,when the number of Ck QST reaches 20 millions,compared with that of the state-of-the-art techniques.(2)To evaluate Ck QST on the highly dynamic data streams with frequent object updating,a spatial and textual mapping mechanism is proposed consisting of two techniques.The first is a cost-based k-skyband re-evaluation technique,which judiciously selects the spatial search range for the queries to reduce the re-evaluation cost caused by frequent object updating based on the update frequency and the load of the objects.The second is an adaptive block based inverted ordered index,which adaptively determines the number of queries in blocks by considering the textual distribution of queries and objects to avoid the problem that the number of queries in blocks is too many or too few when the data distribution is skewed.To evaluate Ck QST,the index integrates OIQ-tree with the cost-based k-skyband re-evaluation technique,and the adaptive ordered inverted index,and is called AOIQ-tree.The average incoming object processing time for AOIQ-tree decreases by 36%,and the average index updating time caused by object updating decreases by 61%,when the number of Ck QST reaches 20 millions,compared with that of the state-of-the-art techniques.(3)To evaluate the KORQ whose spatial attributes cannot be mapped to the index,an approximate algorithm based on the hierarchical data sampling and reduction techniques are proposed according to the features of the expanded path objective value,namely,high-precision sampling by scaling the path objective value,low-precision sampling by reducing the path objective value,and fixed sampling by clustering the path objective value.They reduce the number of paths between the starting vertex and expanded vertex,the paths between any two vertexes,and the vertexes covering the query keywords being verified,respectively.These three techniques greatly reduce the number of the paths and vertexes to be expanded at different degree,and improve the query efficiency.The execution time is reduced by more than 76% on average,compared with that of the state-of-the-art techniques.The above index mapping mechanisms and data reduction techniques are applicable to solveing STQ with similar features,and provide references to the optimization of other queries.
Keywords/Search Tags:query optimization, spatial-textual index, continuous query over data streams, data reduction, keyword-aware route queries
PDF Full Text Request
Related items