Font Size: a A A

The Research And Implementation Of Indexing And Query Techniques Based On HBase And In-memory Database

Posted on:2015-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:W H ZhouFull Text:PDF
GTID:2308330485990648Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computer technology, Internet applications have been evolved into large scale and universal patterns. The data generated by Internet grows rapidly, and the amount of data will be stimulated into enormous amount in the near future. The storage, computing and processing of big data become a problem with the times. With the three classic papers published by Google in the field of cloud computing, people began to rethink the way to store massive amounts of data, and non-relational data storage system began gradually become the mainstream of the era of big data.HBase is an open source project modeled after Google’s BigTable as part of Apache Software Foundation’s Apache Hadoop project. HBase is a typical non-relational and column-oriented database. In practice, HBase can efficiently support retrieval of data by primary key. But when it comes for data retrieval based on some other column’s value or value range, it needs full table scan, which is very inefficient. In the field of traditional relational databases, this problem was solved by database index. In order to improve response time of retrieval and reduce the query overhead, people began to study indexing method for HBase.The major contributes and works in this paper are as follows:First of all, based on the ordered indexing model and hash indexing model, this paper proposes a hierarchical indexing model for HBase in specific application scenarios. The hierarchical model is divided into two layers:the persistent layer, which is used to store all indices; memory-cache layer, which is used to store the most frequently accessed indices.Secondly, this paper designs and implements a hierarchical indexing storage management system enabling scalability and high availability. In addition, for the shortcomings of LRU under big data environment, this paper presents a hot-sensitive cache replacement policy by using exponential smoothing method to achieve high accuracy.Thirdly, this paper proposes a fast retrieval method based on the hierarchical indexing system, primarily to support retrieval by value or value range. To reduce the communication overhead, an improved retrieval method is given.We give several experiments to verify our hierarchical indexing system. The experimental results show that our method achieves excellent performance and scalability.
Keywords/Search Tags:big data, HBase, secondary index, hierarchical storage, cache replacement policy
PDF Full Text Request
Related items