Font Size: a A A

Design And Implementation Of Resolution To Node Failure Using Distributed Checkpoints And Message Logging Technique

Posted on:2008-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:C S JiangFull Text:PDF
GTID:2178360212496825Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Undeniably, the emergence of computer is a miracle of humandevelopment. Due to the emergence in recent years, tremendous changeshave taken place in people's lives. Such changes include the field of scienceand technology and people's daily life.However, with a steady improvement in the use of computers and thedeepening dependence, traditional computer system has been unable to meetthe people's needs. People are constantly yearning for a more reliable, fasterand cheaper computer system. This is the reason for the appearance ofDistributed Computing Systems. The so-called Distributed ComputingSystem is composed of several interconnected processing resources(computers) which can cooperate in carrying out a task under the control ofthe system, and dependend on the concentration of procedures,data andhardware at least. These resources can be contiguous, or they may bedispersed.Compared with the traditional computer system, Distributed ComputerSystem has many advantages, such as: more flexible, more stable, betterability to withstand failure, higher computing power and much cheaperetc.That is why Distributed Computing S ystem can sustained and rapiddevelop.Since the 1990's of the 20th century, the rapid development of computersoftware and hardware technologies and computer network technologygreatly accelerate the development of distributed systems. With thisdevelopment, the theory and research in Distributed Computer Systemshave made many important achievements. They have begun to move out ofthe laboratory environment, and help people changed their lives. Currentlythere are plenty of Distributed Computing Projects, covering the categoriesof mathematics, meteorology,ecology and many other fields. Now Distributed Computing System is making a growing contribution to improv eour standard of living.A lot of theory and experience about Distributed Computing System areworth learning from.They are very useful.This article develop a resolution tothe problem of node failure by using Independent Checkpoints Algorithmand Passive Message Logging Algorithm, which come from theDistributed Computer System.Now, in the field of research and commercial, it may be necessary tocarry out some long terms or provide a reliable uninterrupted service, whilethe node which several tasks run on break down, the work before will belost, resulting in the loss is enormous. If we refer to the idea of distributedsystems, utilize hardware and data redundancy to achieve fault tolerance,then we will be able to provide a security service. Furthermore, if we canprovide a way that can relocate the tasks which worked on the fault node,then it will minimize the losses caused by the fault. This is the major workof this paper.Based on the basic theory of distributed system, at the beginning of thepaper we introduced the notation, history, profiles and features ofDistributed Computing System. These presentations will take readers to thearea of Distributed Computing System. Then we introduced the faultmodel and common approach of Distributed Computing System. With thecomparative analysis of the various solutions, we think the combination ofIndependent Checkpointing Algorithm and Passive Message LoggingAlgorithm as to the best solution to solve the problem of node failure. Sucha solution has many advantages such as fewer resources, ch eckpointsproduction simple, fast implementation, restoring simple, small spacerequirements without domino effect etc. With the addition of some featureDistributed Computing System, we extend the application to solve the nodefault of general system. Solution selected, we use multiple threads andpipeline design the system. Through this way, we will divide the whole handle component into several sub-components. Each sub-componentscompletes a specific task. By dividing large task into several sub-tasksrunning on sub-components, it greatly enhances the degree of parallelprocessing and throughput. Finally, we realized the solution, and thenthrough test ensured the correctness and effectiveness of the solution tosolve node failure issue.The useing of the solution in this paper needs more expendition thenbefore, so the efficiency of the system is lowered. This solution is not fit forshort-time task. However, if we take the loss caused by node failuer intoaccount, then the cost is still worthwhile. Furth ermore, the more time theapplication needs, the more benefit can we get. Even so, we have optimizedthe design of the solution to minimize the cost of the solution. The solutionis effective.Overview the whole paper, we organized the whole developmentfollowing the line: research, analysis, design, realization, testing, summary,evaluation. It is objective that our work achieve the desired goal, what wehave done have great guiding significance to minimize the loss caused bynode failuer and made a good groundwork for further study, besides lowerthe difficulty of distributed programming is also a significant contribution.As indicated earlier, this work is meaningful, but the main work of the paperfocus on the system functions realization and the feasib ility certification,there are many details which can be consummate and optimized. Furtherwork still needs to address.
Keywords/Search Tags:Implementation
PDF Full Text Request
Related items