The Research And Implementation Of Indexing And Query Techniques Based On HBase And In-memory Database

Posted on:2015-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:W H Zhou

Full Text:PDF

GTID:2308330485990648

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of computer technology, Internet applications have been evolved into large scale and universal patterns. The data generated by Internet grows rapidly, and the amount of data will be stimulated into enormous amount in the near future. The storage, computing and processing of big data become a problem with the times. With the three classic papers published by Google in the field of cloud computing, people began to rethink the way to store massive amounts of data, and non-relational data storage system began gradually become the mainstream of the era of big data.HBase is an open source project modeled after Google’s BigTable as part of Apache Software Foundation’s Apache Hadoop project. HBase is a typical non-relational and column-oriented database. In practice, HBase can efficiently support retrieval of data by primary key. But when it comes for data retrieval based on some other column’s value or value range, it needs full table scan, which is very inefficient. In the field of traditional relational databases, this problem was solved by database index. In order to improve response time of retrieval and reduce the query overhead, people began to study indexing method for HBase.The major contributes and works in this paper are as follows:First of all, based on the ordered indexing model and hash indexing model, this paper proposes a hierarchical indexing model for HBase in specific application scenarios. The hierarchical model is divided into two layers:the persistent layer, which is used to store all indices; memory-cache layer, which is used to store the most frequently accessed indices.Secondly, this paper designs and implements a hierarchical indexing storage management system enabling scalability and high availability. In addition, for the shortcomings of LRU under big data environment, this paper presents a hot-sensitive cache replacement policy by using exponential smoothing method to achieve high accuracy.Thirdly, this paper proposes a fast retrieval method based on the hierarchical indexing system, primarily to support retrieval by value or value range. To reduce the communication overhead, an improved retrieval method is given.We give several experiments to verify our hierarchical indexing system. The experimental results show that our method achieves excellent performance and scalability.

Keywords/Search Tags:

big data, HBase, secondary index, hierarchical storage, cache replacement policy

PDF Full Text Request

Related items

1	Research On Indexing And Query Method Of Log Big Data
2	The Research And Implementation Of Indexing And Query Techniques Based On Range Query
3	Design And Implementation Of HBase Hierarchical Auxiliary Index System
4	Research And Design Of Distributed Caching Policy On HBase
5	Research On Cache Based Database Index
6	The Research On Data Replacement Policy Based On Request Frequency Of NDN Cache
7	Research And Development Of Big Data Storage Systems Based On Hbase
8	Research Of Big Data Store Query Technology Based On HBase
9	Research On GNSS Data Storage And Retrieval Based On HBASE
10	Research On Techniques And Systems For Index And Query Optimization Of Big Data