Research On Behavior Of Throughput Collapse In Cluster Based Storage Network

Posted on:2013-09-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:D A Huo

Full Text:PDF

GTID:1228330392457274

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Cluster based storage system, with the features of low-cost, manageable, scalable and other advantages, has been broadly used by most enterprise data centers, especially for the cloud storage applications. These storage systems are generally built on high-bandwidth and low-latency TCP/IP Ethernet, and provide data access for client with multiple storage servers sending the request data block at the same time. In response to the client’s data request, it is easy to produce TCP Incast phenomenon, which will result in effective throughput of the client (which is called Goodput) only under normal network traffic for about20%, and it is a great waste of network bandwidth. In order to solve this problem, researches on generation mechanism and solutions for Incast, have important practical significance for enterprise data centers.Firstly, in order to demonstrate TCP Incast phenomenon occurring commonly from the micro and macro point of view, TCP Incast phenomenon has been reproduced by the experimental test both in simulation and real cluster environment. Analysis from both simulation trace files and modeling on quantify the probability of a random process, gives an analytical quantitative evaluation model, and shows the relationship between Goodput and package loss rate. According to the experimental trace analysis, it is clear that TCP timeout is the main reason which caused Incast, and also the existing congestion control mechanisms and TCP protocol implementations cannot fully play its role in high-bandwidth and low-latency clustered storage network environment. The data storage strategies and synchronous application of high-concurrency workloads exacerbated the probability of TCP timeout occuring. Through the analysis of quantitative modeling, it is easy to answer why the existing TCP congestion control mechanism cannot get its benefits, a quantitative assessment shows the effective network throughput (Goodput) degradation when the burst packet loss causes a number of timeouts, and the the probability of Incast timeout occurring can be estimated. All these efforts provide a theoretical and practical basis for the next step to explore TCP Incast optimization and solution.Secondly, based on research of TCP Incast formation mechanism by modeling and experimental analysis, it is known that the existing TCP congestion control mechanism cannot get its benefits, because RTOmin in the protocol implementation have not fine-grained clock resolution. In Linux2.6.18and later version of the kernel, since entry into the kernel’s support for high-resolution-timer system, the fine-grained clock timer reality by optimizing the TCP protocol implementation can provide high accuracy for RTOmin estimation and the realization of fine-grained TCP retransmission timeouts timer. It will reduce the waiting time of storage nodes when they response to client requests, and improve the bandwidth utilization of the data center.Thirdly, in cluster storage network, for transmission of burst packet loss caused by the transient nature of the outbreak of multiple TCP timeouts, consider control the workload from the application layer. Using the network interface traffic control module in Linux kernel, delivering all traffic control parameters through script which implemented the traffic control algorithm, limiting the simultaneous multi-node storage traffic concurrent transmission rate to avoid network congestion and the outbreak of the transient packet loss events which can cause the TCP timeout. The main idea of traffic control strategy is as follows:make sure the simultaneous transmission of all participating storage nodes share the bottleneck network resources equally, that is to say each synchronous transport stream rate cannot exceed its maximum traffic rate which the bottleneck link can provides.Finally, in the distributed TRAP-based continuous data protection system, in order to get lower response time and high performance of the cluster storage network, there is two aspect of optimization. For network Incast, using RTOmin combination of optimization and traffic control means effective to improve client throughput (Goodput), reducing user response time. For TRAP server’s local disk10traffic, based on the principle of temporal locality buffer chain strategy optimization, reducing total IOs of TRAP server from the local disk, also reducing user response time.Thus, the above researches on TCP Incast generation mechanism and solutions are the main components of this paper for cluster storage network.

Keywords/Search Tags:

Cluster storage, Network performance, RTO optimization, Traffic control, Distributed continuous data protection

PDF Full Text Request

Related items

1	Research On Data Consistency And Load Balanceing Optimization Of Distributed Cluster System
2	Research On Key Technologies Of Data Storage Management Oriented Continuous Data Protection
3	Research On NVDIMM's Application On Data Protection In Distributed Storage System
4	Research On File System Level Continous Data Protection Technology In Distributed Storage System
5	Design And Implementation Of A Network-based Continuous Data Protection System
6	Distributed Continue Data Protection Solution
7	Research On The Key Technologies Of Network Performance Optimization Based On Network Traffic Monitor And Control
8	Research On Performance Testing And Optimization Based On Massive Data Storage
9	The Design And Implementation Of A Continuous Data Protection System Server
10	Efficient Handling And Performance Optimization For Mobile Internet Traffic Data