Font Size: a A A

An Approach To Hotspots Rebalancing In MongoDB Via Repartitioning And Compaction

Posted on:2019-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q WuFull Text:PDF
GTID:2428330590492470Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Big data becomes an integrated part of modern society,which renders e-commerce,AI,online education and etc changed significiantly.Lots of enterprises use big data to boost their productivities.However,the advent of big data challenges the infrastructure of modern enterprises,especially in the domain of storage.The storage of big data is the foundation of processing and analyzing.Traditional storage facilities,including simple disks,RAID,and high level local file systems,and application level relational database management systems,cannot cope with the requirement of high throughput and huge volume of big data.A variety of distributed storage systems have been invented to tackle these issues.For example,Google File System,BigTable,MongoDB,and etc.Different storage facilities provide different storage model and performance tradeoffs.MongoDB is a representative of modern distributed storage system.It provides both high scalability and high availability.In addition,its storage model is more user-friendly.Therefore,MongoDB is widely deployed around the world.However,MongoDB is not a perfect distributed storage system for it lacks some core features of distributed systems,namely,autoscaling and rebalancing.Autoscaling provides a distributed system with the ability to scale out when confronting intensive loads and huge storage.Distributed systems can scale out to handle more requests and preserve more data.On the opposite,when a distributed system's load is low or its volume of storage is superfluous,a scale-in is performed to release resources.Rebalancing provides a distributed system with the ability to adaptively accommodate various access patterns.Generally,data in a distributed storage system is spreading over the whole cluster.However,the heat of different data pieces is different,resulting in different access rates.A distributed system with rebalancing can repartition data and redistribute them to different nodes in order to distribute data evenly to the cluster.After rebalancing,the resource utilization and the throughput of the whole cluster are greatly improved.In order to cope with the defects of MongoDB,this paper designed and implemented a noval non-intrusive framework to extend native MongoDB and equip it with the ability to autoscale and rebalance.To tackle autoscaling,the framework proposed in this paper separates resource management from storage management.Native MongoDB clusters need human intervention to allocate new machines,provision these machines,configure MongoDB instances on them,and finally add them to an existing MongoDB cluster.The procedure is both tedious and error-prone.Our framework will use separate resource management layer to handle resource management.All computation and storage resources are managed by the layer.MongoDB itself is simply responsible for providing storage model.A real time monitor module and a prediction module are provided by the framework to monitor all MongoDB clusters running on top of the framework.When certain cluster is lack of resources or has superfluous resources,the framework will scale out the cluster by requesting a new resource container from the underlying resource management layer,configuring a MongoDB instance and adding the instance to the cluster.The scale-in procedure is similiar.To tackle rebalancing,our framework provides repartitioning,migrating,and merging operations.When the monitor module detects the appearrance of a hotspot shard,the balancer module will scan data chunks on that shard and analyze heat distribution.It will repartition data chunks on the shard and migrate out some data chunks in order to reduce the heat of the shard.The shard received the migrated data chunks can get more traffic.Finally,traffic is distributed more evenly towards the whole cluster and the overall throughput is improved.In the meantime,in order to eliminate the metadata explosion after lots of repartitioning.Our framework will periodically merge and compact cold data chunks thus the chunk count is kept a considerable small level which further improves the routing and range operations efficiency.This paper uses Mesos as the resource management layer,and validates autoscaling and rebalancing strategies on top of it.The result shows that it is viable to use a resource management layer to separate resource management and build storage services above it.In addition,by using a non-intrusive monitoring and scaling framework,MongoDB can be equipped with autoscaling without human intervention.The rebalancing based on repartitioning and migration provided by the framework can effectively cope with hotspot shards.After the rebalancing,traffic is evenly routed to the whole cluster and the throughput is improved.Apart from the repartitioning,the framework also provides data compaction,which can effectively eliminate metadata explosion caused by repartitioning and keep the metadata at a normal level.
Keywords/Search Tags:Distributed Storage, Scalability, Elasticity, Rebalancing, Compaction
PDF Full Text Request
Related items