Font Size: a A A

The Communication Tools Supporting Asynchronous Checkpoint In User Layer

Posted on:2006-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:G P HaoFull Text:PDF
GTID:2178360182457149Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of network technology , especially facing the complexity of the large-scale task, people want to dispel differences between the computers that we can deal with a complicated calculation task as a whole regardless of the differences of one's own systems. So, the appearance of distributing computing system can be said to be a satisfactory solution. The composition structure of this kind of distributing computing system, really solve the problem of insufficient ability when the computer deals with the large-scale task, having pushed the application of the computer to another new climax that make more and more people become interesting to computer field . Under the environment of distributing calculation, the probability that the system breaks down will increase with increase of the figure of the systematic machine. And the procedure of distributing system is usually heavy procedure very of calculating which often need very long time. If the mean free error time of the system is smaller than the execution time of the procedure and the procedures all start afresh after breaking down each time, this procedure will be unable to finish carrying out forever . To avoid a large amount of arithmetical waste when the system carrying out from scratch as above-mentioned incidents happening, and to improve systematic usability fully, we can set up the checkpoint at the proper moment of systematic normal running, and the process can write down the middle operation state of this procedure automatically and set up it as an checkpoint. Then the computer can roll to the recent computing state when above-mentioned trouble appears. Nowadays, a lot of distributing computing systems using the checkpoint algorithm in designing and realizing, not only has offered the fault-tolerant function for distributing computing system but also has realized moving and balancing function of dynamic load system on this basis. And, in recent years, in research of the operating system, especially in Linux and distributing operating system, people also adopt checkpoint technology progressively in order to make the operating system itself have fault-tolerant functions. In view of this, we have put forward the communication tool with checkpoints of supporting for the user layer, and divide it into synchronous and asynchronous methods to realize. This thesis is the communication tools supporting asynchronous checkpoint in user layer. The goal of this design is that if this tool can be realized and applied to real homework communication, it will be expected to improve the transmission ability of the data, and allow each process of the distributing procedure to be mostly fault-tolerant; Since this tool thinks for users, then it will be designed as the tool by which computer communicate with user best. Firstly we have designed the client / the server model(C/S) to realize this communication tool, and we have designed them respectively considering the difference between the server and client. We divide them into user's procedure module, communication and connect module, check point mechanism module and resume module, and define transferring contact and the interaction between these modules. Then we make the clear design to the function of every module and inside procedure. Finally, we do an integrated testing and inspect functions of various fields to the system. This tool innovates a bit as follows mainly: 1. The system support users to participate in the execution of the process, for example resuming the process that collapses or not, treatment to various kinds of signals etc. It still make user know a lot about the implementation of the process as much as possible, for example system can let user know memory route and reflect everything about the process at any time ,and user can look over checkpoint file ,etc. 2. It set up a static variable in user's procedure. The main function of this variable reads as follows. First, it can control the quantity of the process to guarantee the quality and efficiency of the data transmit with other process; Second, confirm process which need resume or not according to the value of this variable when the procedure starts. Because the initial value of this variable is 0, it will add 1 when we establish a stature process, and subtract 1 when a stature process is finished .It will keepfixedness at other situation. So when the process ends anomaly, the value will hold because it is a static variable. That is the value equals the quantity of sub process when the process ends anomaly. 3. It define a buffering area at server end and user end respectively, they read the data from socket set and return a value .When the server has sent a file and the user has received a file, we can inspect the content of checkpoint immediately according to the rightness of the returned figure or not, and write it into the checkpoint file by way of covering, By way of this, these checkpoints can keep consistent and avoid the appearance of the effect of Dominoes. 4. It set up a sign location in the form of the checkpoint, and the value will increase with keeping the checkpoint. When users need resume the process because of the break-up of the process, the serve end and user end will take out the last sign location of the checkpoint, and it will be coordinated by the server procedure to obtain the unanimous state. Then the process will run after this state, thus realize the initial goal. By integration and test of the system, we draw a conclusion: the communication tool which support user's asynchronous checkpoint really improve the efficiency of communication in the environment of distributing calculation. By way of this, it has improved dependability of system. This tool has the following basic function: 1. Support the extensive data to be transmitted in real time: Server end and client end can carry on to transmit a large amount of data in real time not have to consider the appearance of the situation that the system collapse during the transmit of data. 2. Users' interaction: as described in the foregoing. 3. Support breakpoint transmission: While restarting the procedure after the system crash, the system can carry on the breakpoint transmission of the data according to the checkpoint file, thus improve the ability of systematic data communication. The use of the checkpoint mechanism in distributing environment is a new research field. There are a lot of questions need to be discussed. This article is just an attempt. With the development of various kinds of checkpoint algorithms, we expectthat it can play a bigger role in the distributing calculation environment in the near future.
Keywords/Search Tags:Checkpoint, Linux system, Distributed system, Asynchronous, Process
PDF Full Text Request
Related items