Font Size: a A A

Research Of Query Processing Technology For Geospatial Big Data Based On Spark

Posted on:2018-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2348330518498650Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,human beings have entered the era of big data.A plenty of mobile applications in smart phone and a variety of projects using the Internet of Things have produced a lot of space data.Among them,efficient dealing with geospatial data in these applications has the most important significance.How fast and efficient the query and analysis of such a large amount of spatial data could be has become a difficult challenge.Spark,a new,general-purpose distributed parallel computing framework based in in-memory computing,has made great success with its outstanding performance advantage in fast and efficient handling of big data.It provides a Resilient Distributed Datasets(RDD)that has memory storage and facilitats the development of distributed parallel programs at the some time.Therefore,after a fully studying of the related technologies of spatial data query processing,this thesis designs and realizes the geospatial data query and processing prototype platform named GS-Spark based on Spark.The platform extends Spark extensively to support spatial data types,space indexes,and efficient query analysis.The platform architecture consists of three layers,namely,the spatial data storage layer,presentation layer and query layer.The spatial data storage layer realizes the data storage function,constructs a distributed spatial index of the two-layer structure adapting the R-tree and the quad-tree index.The spatial data presentation layer designs an RDD for representing spatial data and an Index Geo RDD for representing distributed spatial index data.The spatial data query layer implements a variety of important spatial query operations including range query,k nearest neighbor(k NN)query and spatial join query based on the spatial data presentation layer.Specific works are as follows:(1)This thesis researches and analyzes the data processing technology involved in the process of platform implementation,including spatial data partitioning technology and STR R-tree index construction technology.(2)This thesis designs the spatial index structure which can fit the distributed parallel programming calculation model according to the study of the distributed indexing technology,and realizes it based on the Spark implementation.Compared with other spatial index construct programs like Spatial Hadoop,GS-Spark has a higher efficiency.(3)This thesis analyzes the distributed range query,k NN query and spatial join query technology,and realizes these algorithms based on the Spark platform.Compared with the current spatial data query program Spatial Hadoop,GS-Spark shows a higher performance.In summary,GS-Spark can fully support the query processing of geospatial data.After preliminary experimental analysis,GS-Spark has shown better performance compared with Spatial Hadoop,especially in response to iterative query analysis will GS-Spark show a performance advantage.
Keywords/Search Tags:Spark, Distributed computing, Geospatial data, Spatial index, Spatial data query
PDF Full Text Request
Related items