Research On The Consistency Maintenance Mechanism Of Data Intensive Distributed Storage Systems

Posted on:2019-07-17

Degree:Master

Type:Thesis

Country:China

Candidate:M K Ruan

Full Text:PDF

GTID:2428330590451726

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development and popularization of cloud computing,a large number of computing tasks are migrated from the local device to the cloud.At the same time,the rise of big data and artificial intelligence have imposed higher requirements on large-scale cloud computing capabilities,and their demand for computing power and data storage capacity has grown exponentially.All this has enabled improving the performance and efficiency of cloud-based systems to bring about great economic benefits.The back end of a cloud computing system is generally an efficient distributed system.Nodes are connected through a network.As a unified whole,they provide computing and storage services.Data consistency is an important prerequisite for the distributed systems.On the one hand,coordination and scheduling among distributed system nodes requires that all nodes have a consistent view of some global variables of the system,thus making coordination between nodes possible.On the other hand,to ensure data reliability,distributed storage systems store multiple replicas of a single piece of data on different physical nodes,thereby reducing the possibility of data loss.In a distributed system built on general commercial servers,due to network and hardware device failures or other reasons,inconsistencies between multiple copies occur frequently after a bunch of reads and writes.This should be avoided in a distributed storage system.It can be seen that maintaining the consistency of data among the nodes in a distributed system is very important.However,consistent maintenance requires frequent checksum transmissions and consumes a large amount of computing resources.When the number of nodes is large,the number of replicas is large.If the network infrastructure is insufficient,the maintenance of the consistency among replicas in the distributed system becomes a performance bottleneck.This thesis takes OpenStack Swift as an example to analyze the main performance bottlenecks and their root cause and solutions.The main contributions of the article are:(1)In-depth study of the technical background of distributed systems.Through theoretical analysis combined with the experimental methods,we analyze the emergence of performance bottleneck in the presence of data updates,node failures and during the recovery of a failed node.(2)This thesis purposes a hash maintenance method in memory to avoid frequent disk IO,thereby improving the overall system performance in the presence of data updates.(3)This thesis proposes a mechanism for detecting and processing failed nodes.It properly detects and handles node failures with moderate overhead,so as to effectively strengthen the robustness of OpenStack Swift.Also,it helps a failed node quickly rejoin the system with a consistent,latest state.(4)In order to accurately measure network traffic,this thesis has implemented a tool to monitor network traffic in Linux.This tool can accurately measure the inbound and outbound traffic of all ports opened by a process between two specific events.

Keywords/Search Tags:

distributed system, consistency, synchronization, performance optimization

PDF Full Text Request

Related items

1	Design And Optimization Of Data Consistency Protocol For Distributed Storage System
2	Research On Data Consistency And Load Balanceing Optimization Of Distributed Cluster System
3	Research On Distributed Virtual Missile Launching & Testing System And Simulation Consistency Based On HLA
4	Research And Optimization Of Distributed Consistency Algorithm For Wide Area Distributed Storage System
5	Research And Implementation Of Distributed Transactions In Microservices Architecture
6	Research On Distributed SLAM Based On Analysis And Optimization Of Performance
7	Distributed Systems, Clock Synchronization And The Event Causal Consistency Study
8	Performance analysis of live-virtual-constructive and distributed virtual simulations: Defining requirements in terms of temporal consistency
9	Research On Generalized Time-space Consistency Framework, Evolution Mechanism And Its Applications In Distributed Simulations
10	The Design And Implementation Of Performance Optimized Distributed Storage Subsystem