Nimbus: Scalable, distributed, in-memory data storage

Posted on:2014-02-28

Degree:M.S

Type:Thesis

University:University of Maryland, Baltimore County

Candidate:Shook, Adam J

Full Text:PDF

GTID:2458390008957433

Subject:Information Technology

Abstract/Summary:

The Apache Hadoop project provides a framework for reliable, scalable, distributed computing. The storage layer of Hadoop, called the Hadoop Distributed File System (HDFS), is an append-only distributed le system designed for commodity hardware. The append-only nature of the le system limits the ability for applications to have random reads and writes of data. This was addressed by Apache HBase and Apache Accumulo, which both allow for quick random access to a highly scalable key/value store.;However, these projects still require data to be read from the local disk of the server, and therefore cannot handle the type of I/O throughput that many applications require. This limits the potential for “hot” data sets that cannot be stored in memory of one machine, but do not need the scalability of HBase, i.e. the ones that can be sharded and stored in memory on dozens of machines. These data sets are often referenced by many applications and be several gigabytes in size.;Nimbus is a project designed for Hadoop to expose distributed in-memory data structures, backed by the reliability of HDFS. By executing a series of I/O benchmarks against HBase and fully integrated with MapReduce input and output formats. The following discusses relevant use cases and demonstrates Nimbus's performance advantage over HBase, allowing for high-throughput data fetch operations for performant applications.

Keywords/Search Tags:

Data, Distributed, Scalable, Hadoop, Applications, Hbase

Related items

1	Research On Distributed Processing Of Massive Video Data Based On Hadoop
2	The Describing Of Sensing Device Platform Based On Hadoop Distributed Data Storage
3	Design And Implementation Of Distributed Query Algorithm Processing Communication Data Based On Hadoop
4	Implemention Of The Massive Telecom Data Distributed Storage And Query System Based On Hadoop
5	Research Of Big Data Store Query Technology Based On HBase
6	Research And Design On Coding Andsearching XML Data In Distributed System
7	The Research And Design Of Distributed Vertical Search Engine
8	Research And Implementation On A Distributed Service Registry Based On HADOOP Platform
9	On The Hadoop Based Distributed Storage Techniques And Its Applications In Content Dissemination Design
10	Huge Amounts Of Sensing Data Management System Based On Hadoop