Font Size: a A A

The Communication Tools Supporting Synchronous Checkpoint In User Layer

Posted on:2006-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:J Y JiaoFull Text:PDF
GTID:2168360155953062Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays, the probability of malfunctions in cluster system willdramatically increase with the augment of the system dimensions. Ifthe nodes of sever go wrong, they can't answer the client's requests,which caused loses of the user and other nodes. Fault-tolerant is oneof the major functions in the cluster system of network severs withmuch more usability. In this paper, we apply as much checkpoints aspossible to realize the Fault-tolerant.Most services of Internet based on the protocols of TCP withlimitations of the connection oriented. On the other hand, UDP is basedon the non-connection oriented, which is much easy to implement theFault-tolerant. Therefore, this paper mainly focused on the server modeof multi-loop I/O, the advantages and disadvantages of this mode, whichhas been discussed above. In this paper, all the mechanism of checkpoints are based on the client level.The conventional fault-tolerant method is log recorders thatrecord the information produced in processes into a stream media, whichcan be read by programs under normal or abnormal conditions. Therefore,the log recorders discussed above have many features such as follows:1. Program read only: The logs can be read only by programs notpeople. This logs discussed above is similar to that used in thedatabase system rather than the usual log such as the one used in theUNIX system.2. Stream mode: Logs are record in stream mode just like tapecassettes rather than the random recorder, which demonstrates thedifference with other files. What is more, the logs can be read randomlyby program at any time.Recently, from the point of method and purpose, the file systemswith logs are classified as follows:1. The configuration of file system are entirely organized by thelogs format, such as log files system(LFS);2. The logs are used as temporary storage, in order to guaranteethe stability and integrality of file system, such as the JFS of IBMand ext3 in the system of LINUX.3. The logs are combined with main memory to become one part ofthe file system's buffer memory, in order to improve the performancesuch as the DCD. However, the storage of checkpoints is quite different from thatof logs. The check points are recorded permanently in disk which cannot be deleted easily and improve the security. On the other hand, thelogs storage only can be transformed into permanent format, when thebuffer is occupied to an extent. The files will lose before the logsare transformed from buffer for some kinds of malfunctions. The checkpoints arithmetic can be labeled as single-process anddistributed program checkpoints arithmetic. The distributed checkpoints arithmetic which are the major study of author, are classifiedinto asynchronous checkpoints algorithm and consistent checkpointsalgorithm. The author adopts the asynchronous checkpoints algorithmto avoid the domino effect, and to reduce the recourse consumption. As the global checkpoints are coming, we save not only theinformation of local checkpoints but also the information interchangedbetween processes. Distributed processes checkpoints arithmetic isclassified into asynchronous checkpoints algorithm and synchronouscheckpoints algorithm (consistent checkpoints algorithm). Synchronous checkpoints algorithm, global consistent checkpointsalgorithm, is the key of this paper. When we save the running state,he harmonize all the processes to make local checkpoints at appropriatetime, which combine all the local checkpoints into a global consistentcheckpoints. The advantages of synchronous checkpoints algorithm are:only the latest checkpoint files are needed, small space assumptionand the absent domino effect as the running state are restored. Thedisadvantages are that the checkpoints greatly affect the processesof programs. Author explains the realization of user checkpoints supportedcommunication method and analysis it theoretically. In this system,there are two processes that communicate with each other, which canbe applied in the client/server model(c/s). We adopt the socket andTCP/IP protocols to connect the internet. When coding these programs,the setup of checkpoints and rollback are applied as the malfunctionoccurred. Meanwhile, the synchronous checkpoints algorithm isoperating on the other process to realize the synchronous checkpointsfunction. Other contents of checkpoints are presented as well, suchas net, process, file and the save and restoration of user information.After the interruption, the checkpoints function will restoreprocesses and send out a signal to other to resume the communication.What is more, there are checkpoints system both in the client and sever.The state maintenance is paid much attention to guarantee the clientis under ideal condition according to the server. Author presents a practical method based on the Socket tocommunicate between processes, with principle of that another processis purposely created to transfer the information between processes ascommunication sever. First of all, he starts a listening Socket tosupervise the connection requirements. Then put the descriptor intofd_set, matrix named precisely before, in which save the listeningSocket and the descriptor created by later communication Socket. Theserver run a function, system call select, to check simultaneouslywhether there are data ,the requirement from clients, requesting anysocket in the matrix. After that, create a new communication socketto connect the client, put the descriptor in the fd_set matrix and savethe ID of client and the descriptor according to socket into the listof ID. When another requirement come in , the server restore the dataand the ID of reception, then find the descriptor of socket in the listcreated above, and communicate the date to the reception client throughsocket. All the other processes are the client. After the client createsa socket connection with the server, he can send or receive informationthrough the communication Socket. If the system is out of function,the sever will receive a response signal kill() to interrupt thecommunication between sever and this node, and save the communicationprocess of this node as a checkpoints file. After this node resumecommunication, he will send a signal to sever and wait for the responseof watch process. Then, the server restarts the communication from thecheckpoints file saved before. The modularization is applied in the setup and restore, as coding.In this program, the functions are called to append the checkpoints,which make the system much more powerful and convenient. Lastly, thetesting is complete by function calling as well. Other methods arepresent to monitor all the state of checkpoints until they reach allthe requirements. Conclusions: The author properly designs the communication mechanism andcheckpoints, which are modified and revised in the procession of codewriting. When the code was done, lots of methods was adopted to testthe performance. With the parameters such as system stability,...
Keywords/Search Tags:Communication
PDF Full Text Request
Related items