| In accompany with the soaring neccesities in the sphere of big data analysis,a series of big data analysis architectures show up.They are designed with the purpose of supporting fast,easy-to-use,scalable computing.Among those architectures,the distributed in-memory computing architectures prove to be more competent than MapReuduce-based architecutures.Therefore,if the distributed in-memory computing architectures are applied in geo-analysis,it provides something new to reserch for high-performance geo-computation.However,nerther MapReuduce-based architecutures nor distributed in-memory computing architectures natively support spatial data or spatial analysis.In this way,a chain of improvement,including spatial data management,spatial indexes construction,and spatial query,appear in supporting geo-computation for in-memory computing architectures.On the basis of above arguments,this paper takes advantages from the distributed in-memory computing architectures,focusing on the spatial vector data and its optimizing strategies.(1)The data paritition method is decided with less time consumed,and it is based on random sampling as well as adaptive Hilbert curve code.The data partition method is the first procedure of processing after data are imported into the system,partitioning the whole data into different nodes.Owing to the spatial dimension of such kind of data,putting the spaial adjacent data into the same node is thereby important for later processing.Therefore,the Hilbert curve code is utilized for data partitioin.Likewise,because of the imbalanced data distribution,the suitable layer of Hilbert curve varies data by data.To calculate the optimized layer,the random sample is used in the iterative computation to reduce the I/O and to accelerate the process.Moreover,the data skew and boundary object problems are solved.(2)The buffer analysis algorithm,which is compute-intensive,is optimized parallely with the utilization of three methods,approximate splitting,grid-based accumulative data decomposing,and given-depth tree-like merging,respectively.It is proved by experiments that this optimized method is more superior than the buffer tools in preivaling commercial GIS softwares as well as the former parallel buffer strateges.Buffer analysi is one of the most basic spatial analysis methods,whose parallelism strateges play pivotal roles in accelerating the computation.However,traditional parallel methods are always based on MPI(message pass interface),thus been limited in scalability.Worse still,the spatial attribution is not considered in such methods.So,the optimized parallel buffer method consists of three steps.Firstly,the spatial adjacent attribution is obtained by data partition based on Hilbert space filling curve code.Secondly,the data are decomposed into small databricks while taking load balance into consideration.Lastly,the given-depth tree-like merging method is adopted after the buffer generation in every databirck.(3)The parallel aggregation index is put forward,which enhance the efficiency of the precise aggregation query when compared with other indexes methods supported by present distributed in-memory computing architectures.On this basis,a parallel approximate aggregation index is later designed for the same query,where the query result is returned with a credit interval under given confidence coefficient.Spatial aggregation query has been wiedely used if only the combined result,rather than a set of imformation of every single object,are requreied.Specifically,the rectangle aggregation query is mainly studied in this paper.In the first place,the aggregation R-tree takes advantages from Hilbert partition and parallelism.Later,when combined with grid index as global index,the two-layer index is thus completed for precise aggregation query.In the second place,the random sampling strategy is taken into full consideration since it contributes to a more effective computation for large-scalae mass data.Afterwards,the multi-layer indexes are created according to the multi-layer random samples.In this way,the query is conducted layer-by-layer with the approximate results returned.Apparently,the precision of the results gets higher and higher with the time gose by.For the users,the credit intervals along with the results can be seen as the judgement whether to stops the query.(4)The HiStream system is designed and completed for spatio-temperal hotspot online analysis.This system provides an array of ananlysis tools on POI(Point of interest)and ROI(Range of interest)scale.On top of the former mentioned optimizing strateges,the HiStream systems also reap plenty benefits form the data visualization techniques based on HTML5.Therefore,the system provides users with friendly interactive interface.The basic function of HiStream is real-time hotspot computation and visualization,achieving a high-speed response in less than 1s with even 100-millon-scale data.Besides,there are plentiful toops desined for explosive and efficient spatio-temperal pattern discovery.To give better illustration,three pattern discovery examples are presented and explained. |