Font Size: a A A

Research On The Methods For Network Performance Improvement In Computer Cluster System

Posted on:2014-10-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WanFull Text:PDF
GTID:1228330425473296Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In high-performance computer cluster, it has become the mainstream technology that uses high-speed network devices to interconnect each node. The network components are the important part of the cluster system architecture, it influences many performance indicators in cluster system directly. On the other hand, the network of cluster has many obvious good features, such as high bandwidth, low latency, and high reliability, etc, which are different from normal network environment. If the network components can’t take full advantages of these good features, they will easily become the bottleneck of the cluster system. So, it is very necessary to optimize the cluster network according to the features of high-performance cluster network.Remote Procedure Call (RPC) is a network component and communication technology which has been widely used in cluster and distributed systems. However, the performance of traditional RPC technology will be seriously decreased in high-speed network based cluster system. The main reason is that the traditional RPC technology has not take full advantages of the cluster network. We have carefully studied the traditional RPC technology, which suggest that the decreasing of cluster network performance is mainly caused by the serialization/deserialization process of RPC technology. Thus we proposed a dedicated serialization/deserialization scheme-SimpSerial, which can run on homogeneous cluster system. This scheme can well improve the performance of cluster network by reducing the number of date copy operations during the course of traditional serialization/deserialization process. We have realized this scheme in our real-world cluster system and made a detailed evaluation on it. The evaluation results show that SimpSerial can significantly promote performance of cluster network.In high-performance cluster system, if the timeout situation occurs frequently, it will seriously hinder the overall performance of the cluster system. The traditional adaptive timeout mechanisms can dynamically adjust the timeout value, so they have been widely used in RPC system. However, these mechanisms have their own shortages and still need to be further improved. According the carefully analyze to the situations both in client and server sides when timeout occurs, we found that there are two serious problems in traditional stand-alone timeout mechanism. One is that when timeout occurs, the server side often has the situation which we name it as "Task Congestion" in RPC system; the other is that the adjustment tendency of timeout value is only to rise, so the timeout value will be larger and larger. According to above works, we proposed an Multi-Ranged RPC Timeout Mechanism which divided the whole range of timeout value into two sub-ranges, when the timeout value belong to the different sub-ranges, it will use the different method to adjust the timeout value. The Multi-Ranged RPC Timeout Mechanism has a larger range of timeout value, and has a faster speed to adjust the timeout value, so as to solve the "Task Congestion" problem in some extent. Meanwhile, there is a novel algorithm in which the timeout value will decrease to a reasonable range when it is too large. Therefore, this mechanism has good accuracy and adaptability.Transmission Control Protocol (TCP) and Remote Direct Memory Access (RDMA) are both the widely used protocols in cluster system. In traditional analysis, it was generally agreed that TCP protocol is easily became the bottleneck in high-speed network environment, and attributes the overhead of TCP mainly on that there is many times of memory copy operation during protocol processing, this lead to the low efficiency and too much CPU cycles consuming. We re-analyzed the performance of TCP and RDMA on the modern server platform, and found that the processing efficiency and performance of TCP can be improved with the rapid development of the computer architecture and hardware components. On modern cluster platform, TCP can get a very good performance. Compared with it, the compatibility and complexity problems in programming brought by RDMA technology are still serious. So, TCP still has wider situation to be used and has great space that can be improved on modern cluster platform.Based on this conclusion, designed and implemented the network middle layer in our Cappella cluster system, there are two main transmission methods which worked in user space:RPC over TCP, RPC over RDMA. The tests show that these two methods all can get good performance. According the real network middle layer example design, we summarized the methods that used to promote the network middle layer in cluster system. We think that reducing the data copy operation in all network layer, and using multi-stream in TCP protocol are the two typical methods to promote the network performance in cluster system.
Keywords/Search Tags:Cluster, Remote Procedure Call, Serialization, Timeout, Transmission ControlProtocol, Remote Direct Memory Access
PDF Full Text Request
Related items