Font Size: a A A

Research And Implementation Of Parallel Index For Space Information System

Posted on:2019-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:H YanFull Text:PDF
GTID:2428330572955608Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of space technology,the traditional mode of artificial tracking exposed more and more defects,because of the rapid growth of space information.The defects include that space information data cannot be queried efficiently,information cannot be displayed intuitively,data cannot be updated in real time,and no analytical decision support can be provided.Therefore,this thesis constructs the space information system,which associates space data,to provide efficient data management,multi-dimensional data statistics and diversified data visualization service.For the growing large-scale space data,building high performance database index can greatly improve the performance of system retrieval,and parallel index technology provides a good solution for large-scale data retrieval.Through the research of traditional index technology,including tree-based index,Hash-based index and inverted index,and the existing big data indexing technology,including secondary index,high-dimensional index,two-layer index and parallel index,we know that traditional index technology cannot work better in large-scale data,but parallel index can greatly improve the efficiency of big data indexing.After studying index technology and Hadoop framework in detail,this thesis combines HT-tree index with Hadoop,and puts forward the parallel HT-tree index.The proposed parallel HT-tree index using built-in fault-tolerance and concurrency management features of Hadoop,divides the whole data set into multiple partitions,and the HT-tree of each partition is built in parallel.Moreover,if a range query is carried out in parallel using parallel HT-tree index,the map phase compacts search space by eliminating part of the HT-tree that does not include the search key,and reduces the range query time.The index mode can effectively reduce the index construction time and improve resource utilization,thus the index technology can greatly improve the retrieval efficiency of the large-scale structured space data.This thesis shows the detailed construction scheme and implementation algorithm of the parallel HT-tree index.And the execution performance of parallel HT-tree is optimized as follows:(1)batch loading technique is used to build the small HT-tree of each partition to improve the spatial efficiency and the overhead of building the HT-tree.(2)The chained process which connect jobs reduces the access time of intermediate data.(3)The load balance optimization algorithm based on load statistics is proposed to improve the load imbalance which caused by the data skew during the execution of parallel HT-tree index.And then,this thesis makes a comparative analysis of the performance evaluation of no-index on Hadoop,parallel cluster index and parallel HT-tree index,which verifies that the proposed parallel index can effectively improve the execution efficiency of indexing and the performance of searching structured data.Lastly,this thesis designs and implements the space information system,which realizes five functions,including authority management,space data management,the latest update,data query and data statistics.And the parallel HT-tree index is applied to the system to enhance the retrieval efficiency of large-scale space data.
Keywords/Search Tags:HT-tree, Parallel Index, Hadoop, Data Skew, Balanced Partitioning
PDF Full Text Request
Related items