Font Size: a A A

Distributed Storage Query Optimization For Massive Spatial Data

Posted on:2018-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2348330536974497Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Location-based services provide users with a variety of real-time services based on user location,which are both an important form of service and the basis of many applications in the real world.For example,location-based smart tourism recommendation,urban public service pushing in specific space-time domain,and real-time locating consumer groups in the business districts,etc.For the above applications,we need to query the location-related spatial objects quickly,and then to utilize efficient similarity computing to realize real-time recommendation and information pushing.Query performance on large-scale spatial objects is an important factor to provide real-time services.However,a big conflict exists between the large scale and diversified spatial objects and the continuity,heavy load,and real-time requirements of queries,which put forward a huge challenge to manage and access the massive spatial objects.Aiming at the real-time requirement of location-related services,we conducted in-depth study on the massive spatial object storage and access optimization.Considering the spatial object query requirements in the location-related applications,we introduced the GeoHash geocoding system for the spatial features of spatial objects and fully developed the performance advantages of the distributed memory computing architecture to design both the distributed storage model and range query model for spatial objects,which make full use of both key-value storage structure and the column storage schema.In order to further improve the real-time response of various applications,we designed a distributed and multi-level index structure based on the large memory and multicore features of the distributed memory computing architecture,which optimizes the access performance of large-scale spatial objects.The effectiveness and efficiency of the proposed model have been fully proved from both the theoretical analysis and experimental verification.The specific research contents and achievements are as follows,First,a group of large-scale and extensive experiments were designed to verify the performance of traditional databases and large-scale data computing platforms.Considering the difference in processing architecture,storage medium and query strategy,we designed experiments to evaluate the primary performance-related factors,such as index,memory,disk I/O.Based on the experimental analysis,we summarized the applicability of each platform for different tasks,which provides a factual basis for the platform selection and optimization for large-scale data computing.Second,aiming at the storage requirements and real-time access constraints of massive spatial objects,the advantages and disadvantages of existing distributed storage systems are analyzed comprehensively.Based on the coding technology of spatial objects,we proposed an improved spatial data distributed storage model,which uses GeoHash coding to transform the two-dimensional spatial object into one-dimensional string key,and then to be integrated with the key-value storage model and the column storage schema to provide a new underlying data storage structure.The proposed model can store spatial object with different number of attributes and support query optimization for different column families.At the same time,based on the characteristics of "coding similarity corresponding to spatial similarity" implied by spatial object coding strategy,we put forward a management strategy of "high-precision coding and low-precision storing" by exploiting the management mechanism of multi-version data.A theoretical analysis on the relation between the coding precision and query performance is also presented.A large number of experimental results also show that the proposed model has good storage scalability and query performance for spatial objects.Third,in order to further improve real-time query response on massive spatial objects,we exploited the performance advantage of distributed memory computing platform to design the memory storage model and query optimization architecture for massive spatial objects.Based on the integration of GeoHash coding and key-value storage model,we utilized RDD structure to organize spatial objects in memory,which makes a tradeoff between the random access characteristics and the effective management requirements of memory space,and optimizes data access performance for different applications by the local data storeing strategy.Both the hardware performance exploitment and storage strategy are taken into account to maximize the spatial object query performance.Further more,a trie tree index structure is designed based on the distributed memory environment to support the efficient distributed query architecture.A large number of experimental results show that the memory storage model can meet the real-time query requirements under different query load as well as ensure the scalability of data storage.
Keywords/Search Tags:Distributed Architecture, Memory Data Management, Range Query, key-value
PDF Full Text Request
Related items