Font Size: a A A

Research And Optimization Of Distributed Spatial Database Based On Hive

Posted on:2016-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:C X LiFull Text:PDF
GTID:2308330470975435Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Distributed spatial Distributed spatial database is the spatial data technology and more attention by the researchers in the field of distributed database technology, one of the scientific research content at present has been used in all walks of life. Because of the space data with a large amount of data, attribute data is closely related to the spatial data, spatial calculation complex characteristics of spatial database query efficiency has always been an important index of weighing, measuring its performance. The Apache foundation has developed in a distributed system architecture for Hadoop platform, it is a can let users easily architecture and using distributed platform. Users can easily develop and run on Hadoop handle huge amounts of data applications, which is based on the Hive a reliable, efficient data warehouse tools.In this paper, the key technology of distributed spatial database tools Hive learning and research, the hadoop distributed spatial database system compared with the traditional database, through the powerful distributed performance to achieve the space computing efficiency, more suitable for large amount of data of the calculation of spatial data.The main work is as follows:(1) through the understanding of the basic theory of spatial data and understanding,based on the data warehouse tool of HDFS Hive framework, a distributed computing model based on Hive HDFS graphs, a distributed file system framework for analysis.(2) the simulation of spatial data retrieval, and based on the data warehouse tool of HDFS Hive extension of the framework was designed and implemented the SQL extensions to make it easier for the Hive query support, realize the Hive to distributed spatial data query.(3) in the Distributed Spatial database DSQ(Distributed Spatial Querier), after thecompletion of the design of a preliminary implementation, some sex solves the Hive in a query that often appear in the process of data skew problem, and the Distributed computing the causes of the problem of data skew are analyzed in detail, and some optimization approach to improve the HDFS data processing efficiency.(4) the proposed is verified by experiment of distributed spatial query system based on Hive in the query efficiency and large computing devices of the same computing performance comprehensive performance compared with obvious advantages.
Keywords/Search Tags:distributed spatial database, Hive, Query expansion, Optimization framework
PDF Full Text Request
Related items