Font Size: a A A

Optimization Of Massive Meteorological Structured Data Query Based On HBase

Posted on:2017-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X C XuFull Text:PDF
GTID:2308330485998911Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Massive meteorological observations data is the key of the improvement of public weather services’refinement, precision and individuation. The timeless requirements of meteorological data that increased 1 TB every day for its storage, retrieval and share is a challenge to traditional meteorological information system while based on IOE (IBM, Oracle and EMC) framework. Therefore, the meteorological data sharing oriented platform has become a hot research topic. It is designed by distributed architecture and linear expansion ability to handle massive meteorological data. So, the load balance and low delay of query is the key to ensure the efficient usage of meteorological service.This work mainly researches data partitioning and indexing strategy of HBase (Hadoop Database) which exist some shortcomings, and put forwards a new solution in the stage of data import and query. Details and achievements are as follows.(1) Offering load-based Region split strategy. In the import stage, the load value of each node is calculated, and then the split threshold of Region is determined according to this value. The load comes from evaluation function, which contains several elements affecting load balancing of RegionServer, such as CPU used rate, memory used rate and I/O access rate. The weight of each element is determined by Analytic Hierarchy Process. Comparing to split strategy of HBase, this strategy can dynamically adjust the split threshold according to the load in different RegionServer and is more outstanding in load balancing to a certain extent.(2) Offering coprocessor-based secondary index model. At first, this work designs the HBase table structure according to the characteristics of meteorological structured data file, and then models the secondary index according to the characteristics of HBase storage, which persist the original data to index-tables based on hierarchical index theory. At last, it uses callback functions to synchronize and maintain indexes by coprocessor. Comparing to native HBase, this model overcomes the problem of non-rowkey query in HBase and greatly improves the query efficiency of structured data.
Keywords/Search Tags:meteorological data, HBase, split strategy, secondary index
PDF Full Text Request
Related items