Font Size: a A A

Design And Implement Of Fault-tolerant Recovery Mechanism Based On Platform EGO Web Service Gateway

Posted on:2008-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:X L DingFull Text:PDF
GTID:2178360212496011Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
From OGSI to WSRF,grid computing has gradually adopted Webservices and SOA technologies to solve the problems resource sharing inheterogeneousenvironmentsofscience,engineerandcommerce.Platform EGO is aSOAbased grid platform newlyreleased byPlatformComputing Inc. Unlike LSF, PBS, and SGE etc traditional cluster computingenvironments, Platform EGO is built on open standards and architectures(Web services, SOA and virtualization) to allow for extensibility andflexibility to manage the shared resources across geographically dispersedsitesfordiverseenterpriseapplications,servicesandworkloads.Platform EGO web service gateway, called WSG, is a grid middlewareto enable the applications, called web service clients(WSC), to accessPlatform EGO services as web services. WSG run as a part of EGO, WSGprovideastandardWebinterfaceforWSCstoaccessEGO.Design WSG to achieve high reliability, performance and interoper-ability. To support a huge user base and reduce the response time, WSGs areable to work in cluster model and the loads are dynamic balanced amongthem. Moreover, a lightweight notification mechanism is implemented toprovide better interoperability between WSG and WSCs. Furthermore, weenhanced theWS-SecurityUsernameToken profile so that Platform EGO cansupport role based access control more flexible. We designed a session-baseda-synchronized recoveryalgorithm to realize WSG fault tolerance, which hasshortfreezingtimeandisabletoisolatetherecoveryprocessforeachWSC.The paper design and realizes a session-based a-synchronized recoveryalgorithm. The design is just based on the characteristics of Platform EGOWSGandtheexistingfault-tolerantmechanism.First, we do a little research and analysis of fault-tolerant technology.The basic idea of fault-tolerant first from the hardware fault tolerance, thetheory of hardware fault-tolerant and application has made majordevelopment. Hardware fault-tolerant has becomes a mature technology andapplies to the practical system. Software fault-tolerant is the basic ideaextended from the hardware fault-tolerance. In the traditional distributedcomputing system, fault-tolerant technology has in-depth research. On thehardwareside,wecanusetwobackup,triple-modularredundancytechnology.In software, we can use a copy of technology, disk shadowing, checkpointsandothertechnology.Although many works have been done in the field of distributed systemrecovery, the research on web service fault-tolerance is still new in this area.Currently there are no standard specifications dealing with fault tolerance inweb services, and the research works also focus on the differentcharacteristicsofwebservices.Normally a web server does not maintain the active connections withclients, and is stateless. Hence, in many cases, people just use very simpleprotocol to handle the web service crashes. A service monitor mechanismwould be used to detect the service fault and the future requests from clientswillbere-directedtoredundantservers.Different with the traditional Web services, the connection of PlatformEGO WSG server and client is state. Consequently, WSG considering thestate of the client and server while WSG deal with the requests for WSCs.WSGdecideswhethertodealwiththerequestofWSCsandhowtodealwiththerequestofWSCsaccordingtothestateoftheWSCs.Next, the paper presents an important concept in WSG: Session. Itrelates to WSG's recovery, notification, and performance tuning etc. WSGhas two kinds of sessions. One is the sessions between WSCs and WSG,which are called client sessions. The other is the sessions maintained byWSG and EGO services, which are called service sessions. After a clientsession is created, multiple service sessions would be setup by WSG andspecificEGOservices to perform therequests from theclient session.Onceaclient session is closed by the WSC, all the corresponding service sessionswillbeclosedimmediatelybyWSG.The communications among WSCs, WSG and EGO services are notsimple request-response model. Notifications play very important rolesespecially for the WSCs that are designed as event-driven applications.Therefore, WSG must rebuild the service sessions and the notificationmechanismafterrestart.Otherwise,itcouldcausedeadlocks.For example, when WSC send request to WSG for resource allocation,connection established. Before the resources that EGO allocated for WSCreturn back to WSC, WSG crashes and restarts. WSG can't build the servicesessions by itself as it doesn't know either the session's credential or WSC'suser-name/password. At the moment, the WSC will not send any furtherrequest unless it receives the resource allocation notification. However, thenotification will never be delivered, as the service sessions between EGOservices and WSG are not rebuilt. It's a deadlock. Even worse, the resourcesallocatedfortheWSCwillnotgetusedorreleased.So, the paper designed a recovery table for WSG. The table records thecurrent alive client sessions, the EGO services that a WSC is accessing. Beensure the security of WSG, username and password can not store in therecovery table. Then WSCs'endpoint to receive notifications is necessary.When WSG needs get username/password to log on EGO, WSG will sendrequest to WSC and get them. After restart, the recovery algorithm willrecoveralltheexistingclientandservicesessions.The recovery algorithm consists of four parts. WSG executes differentpart due to the different status. WSG can receive normal requests after itrestarts. At the same time, the main thread of WSG will create a new threaddedicated for recovery. The recovery thread will recover all the WSCs thatexist in the recovery table. For WSCs, after received request that ask itsusername/password, WSC will put the client session's credential in thereply'sSOAPmessageheaderandsendthereplytoWSG.At the end of the paper is the performance estimation of thefault-tolerance recovery mechanism. Form the perspective of the userestimates whether this mechanism is efficient and deadlock-free. Use manycasestestWSG'sperformancefromdifferentsidesandgivethetestresult.
Keywords/Search Tags:Fault-tolerant
PDF Full Text Request
Related items