Font Size: a A A

Research Of Query And Analysis Technology For Spatio-temporal Big Data Based On Spark

Posted on:2019-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:M Z SuFull Text:PDF
GTID:2428330572955620Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of Geographic Information System and mobile applications,the amount of spatio-temporal data has increased dramatically in recent years.The spatiotemporal data has some characteristics like large scale,rapidly growth,complex structure.Even the high-performance computers even can hardly solve the massive spatio-temporal data problems of querying and analysising.Although the Spark distributed computing platform is popularly used for parallel processing of massive datasets,it doesn't support spatio-temporal data directly.In view of the above problems,this thesis,based on Spark,aims to study the key technologies of spatio-temporal precessing related fields,including range query,k NN query,predictive analysis,and cluster analysis.And we build a ST-Spark system which meets the requirements of storage and query performance.The main works are as follows:(1)Based on the unstructured and large quantities of spatio-temporal data,this thesis designs models of spatio-temporal data and grid index on Cassandra distributed database which are fit for Spark.Indexes are separated from data.This data model ensures the locality of data by storing the trajectory time series on the node in sorted order.(2)Based on the spatio-temporal data storage model and index model,the spatio-temporal range query and k-nearest neighbor(k NN)query algorithms on Spark are proposed.we use Casssandra's server-side filter to solve the problem of coarse filtering inefficiency.A “grid extension ” algorithm is proposed to determine a grid set which contains more than k spatiotemporal objects.(3)Based on the theory of polynomial fitting for linear-regression analysis,this theis designs and realizes a trajectory prediction method based on sliding window polynomial fitting.this thesis implements predictive spatial-temporal range query based on the predicted trajectory.(4)Based on the DBSCAN algorithm,a ST-DBSCAN on Spark algorithm is proposed by three-dimensional extension,which can sum up with "uniform partition,local st-dbscan,global merge and relable".The main idea is dividing the spatio-temporal dataset evenly,and expanding the partition data by parameters,performing local st-dbscan parallelly,mergeing globally and relabling the dataset based on the points which contains in outter partition.Finally,We took T-Drive and GDELT data to evaluate ST-Spark's performance in cluster environment.The experimental results show that ST-Spark is superior to some other same type system in query performance,it has high accuracy but small error in predictive analysis,and effectivities in ST-DBSCAN analysis.
Keywords/Search Tags:Spark, Distributed computing, Spatio-temporal data, Spatio-temporal query, Spatio-temporal analysis
PDF Full Text Request
Related items