Font Size: a A A

The Formal Modeling And Optimization For Data Partitioning And Replica Consistency In Distributed Storage Systems

Posted on:2018-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D HuangFull Text:PDF
GTID:1368330566488041Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As Big Data grows in popularity,distributed storage systems are deployed in many applications.Many of these systems employ complex data partitioning methods and multi replica mechanism while allowing trade off between system performance and advanced features.As a result,users need to tune the system to gain their desired performance.However,if users do not understand the system principle,it is hard to explain the system behavior,analyze the cause of phenomenon and optimize the system.What is worse,the complex implementation of distributed storage systems increases the difficulty of understanding the system principle.In this paper,we propose a Petri Net modeling framework for distributed storage systems based on system log.Using the Petri Net model,we introduce how to optimize the data partitioning and replica consistency.The main contributions of this paper are as follows:· To solve the problem that log events have complex relations,the models generated by current mining algorithms have terrible readability,large scale,and weak elasticity,the paper proposes a two-stage modeling framework called "log mining-model transition".In the first stage,the paper proposes a "local-global" log mining method.The method splits the log according to the owners of the events,so that simplifies event relations and makes the generated models more readable.Then the method assembles generated models together according to models' relation.As a result,the method builds an elementary net.In the second stage,the paper proposes converting the elementary net to a Coloured Petri Net by folding and symmetrizing the net model.The adaption problem is also concerned at this stage.As a result,using the second stage we can get a more powerful Coloured Petri Nets model.· To solve the imbalance of data partitioning based on consistent hashing in distributed storage systems,the paper proposes using the probability of reachable states of the Petri Net model to describe the data partitioning result.Then the paper introduces the imbalance coefficient to evaluate the data partitioning.By using an optimization model and a dynamic programming algorithm,the paper solves how to partition the data for initializing and scaling a cluster.· To improve the replica consistency,the paper analyzes the cause of inconsistency of replica by using the Coloured Petri Nets model.Then the paper proposes measurement and monitoring methods for replica consistency.Several optimization methods,such as rearranging the event order in the write queue,changing the degree of parallelism of the queue,and optimizing the quorum configuration are proposed for better data-centric and client-centric replica consistency.
Keywords/Search Tags:coloured Petri Nets, distributed storage system, process mining, data partition, replica consistency
PDF Full Text Request
Related items