Font Size: a A A

Nimbus: Scalable, distributed, in-memory data storage

Posted on:2014-02-28Degree:M.SType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Shook, Adam JFull Text:PDF
GTID:2458390008957433Subject:Information Technology
Abstract/Summary:
The Apache Hadoop project provides a framework for reliable, scalable, distributed computing. The storage layer of Hadoop, called the Hadoop Distributed File System (HDFS), is an append-only distributed le system designed for commodity hardware. The append-only nature of the le system limits the ability for applications to have random reads and writes of data. This was addressed by Apache HBase and Apache Accumulo, which both allow for quick random access to a highly scalable key/value store.;However, these projects still require data to be read from the local disk of the server, and therefore cannot handle the type of I/O throughput that many applications require. This limits the potential for “hot” data sets that cannot be stored in memory of one machine, but do not need the scalability of HBase, i.e. the ones that can be sharded and stored in memory on dozens of machines. These data sets are often referenced by many applications and be several gigabytes in size.;Nimbus is a project designed for Hadoop to expose distributed in-memory data structures, backed by the reliability of HDFS. By executing a series of I/O benchmarks against HBase and fully integrated with MapReduce input and output formats. The following discusses relevant use cases and demonstrates Nimbus's performance advantage over HBase, allowing for high-throughput data fetch operations for performant applications.
Keywords/Search Tags:Data, Distributed, Scalable, Hadoop, Applications, Hbase
Related items