Font Size: a A A

Study On Fault-Tolerant Execution Model And Implement Methods In The Migrating Workflow System

Posted on:2010-02-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:C X LuFull Text:PDF
GTID:1118360278474021Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The migrating workflow is a mobile agent-based workflow management technology. The performing agent of tasks, which is named migrating instance, is constructed from mobile agent paradigm. Work place is mapped to the network node and its service of workflow participants. Network nodes are working sites of migrating instance, while services provided by nodes include runtime service and workflow service. Migrating instance can utilize local resource to perform one or several tasks at one work place. If necessary, it can migrate with its task list and current results to another satisfying work place to continue its work. Migrating instances created for a common migrating workflow can work collaboratly to meet the needs of management of parallel business processes.The migrating workflow management system is composed of a migrating workflow management macine and several work places with trust relations. The migrating workflow management machine is used to organize, manage and supervise workflow for the sponsor of workflow. A work place, representing an organization or a corporation which participate in collaborating works, provides services for migrating instance. The workflow engine, which is located at workflow management machine, provides support for management of workflow alliance, definition of business process, creation, dispatching and watching of migrating instance. If a business flow is divided into several parallel business processes, each of which is performed by a migrating instance. Hence multi-migrating instances performing a common business flow can be created parallely. A work place, including a docking station and work host network, is the working location of migrating instances. It receives the query and request of migrating instance, provides the running environments and running services when migrating instance arrives. Moreover, it requests for data services and functional services for migrating instances. If the migrating workflow engine and work place are deployed together, each workflow participant can organize and lanch its workflow. Hence running of multi-business processes is permitted in a common workflow systemThe running environment of migrating instance is an inter-organizational network, hence the task performing process is prone to be affected by uncertainty, e.g. host faults, channel failure, communication failure, service and resource mailfunction etc. Faults or failure will distort execution of migrating workflow, moreover, they can cause migrating instance to death, even worse, the abortion of migrating workflow. Hence fault tolerance of migrating instance is necessary to ensure reachable, correctness and reliability of migrating workflow. The fault tolerance of migrating instance includes three facts: fault tolerance of execution, fault tolerance of communication and fault tolerance of state.·Fault tolerance of execution: migrating instance can perform tasks reliably at any work place. In migrating workflow system, workflow tasks are performed by migrating instance through moving consecutivly and making use of local services. Work place provides not only runtime environment, but also reliable workflow services for migrating instance. Physical faults or logical malfunction can disturb conventional execution of migrating instance. Especially for such long transactional tasks as booking or payment which demands high reliability, transactional property should be ensured because of visiting to important database. Hence fault tolerant scheme is indispensable for migrating instance.·Fault tolerance of communication: communicating mails of migrating instance can be sent and submitted reliably. In migrating workflow system, communication is the basis to implement cooperation. Only when communicating mails are sent and submitted reliably, the success of cooperation can be ensured. There are two factors that can cause communication failure of migrating instance. (1) physical faults of communicating chanel, which cause mail unsent; (2) migration of migrating instance, which cause mail can not be submitted due to the random moving of instances, i.e., when a mail gets to a target, the receiver has already been gone. Mail can be resent through backup chanel for physical faults, but for failing submission due to randomly moving of migrating instance, the location tracking and mail transferring mechanism is needed.·Fault tolerance of state: exceptional states of migrating instance can be catched and resumed. States of migrating instance are divided into regular state and exceptional state. The exceptional state means migrating instance is not trackable and availabal because of some physical faults or suffering from attack. Business processes which are executed parallely often possess relevance on data or time. If one migrating instance appears exceptionaly, other migrating instances will be exceptional or blocking. Therefore, the fault tolerance of state can not only catch exceptional state and resume timely, but also compute the affected scope and restrict the spread of exceptional states effectively.This study is mainly supported by the National Nature Science Foundation of China under Grant No.60473123 and No. 60573169, based on the migrating workflow framework. This thesis absorbs the research results of relevant fields, focuses on the fault tolerance model of the migrating workflow. Some implementation schemes are presented including the fault-tolerant execution model, the reliable communication model of migrating instances, and the collaborating monitor and coordinated recovery scheme. The results have been analyzed and validated through an experimental case. The main contributions of this thesis are described as follows:1. Research on the fault-tolerant execution model of migrating workflowIn order to perform workflow tasks reliably, a fault-tolerant execution model of workflow is presented in this thesis. The model possesses hierarchical structure which is made up of service layer, instance layer and coordination layer. The possible faults of the three layers are descriped and the fault-tolerant implementation scheme is established. Moreover, the framework of fault-tolerance model is provided, the experimental case is devised, and the experimental environment is established, which are the research basis of latter chapter.2. Research on the fault-tolerant execution model and implementation of migrating instance.In order to implement the fault-tolerant execution of migrating instance, workflow tasks are divided into different types: time-critical tasks (TCT) and business-critical tasks (BCT). The former represents short transactional tasks requiring strict response time, e.g. real-time data processing, online software updating etc.; the latter represents long transactional tasks requiring high reliability, e.g. booking, paying, money transferring etc., to which the transaction property should be ensured when performing modify operation in an important database. In this thesis, a fault-tolerant stage construction model based on space replication method is provided. The definition of dynamic stage and dynamic priority is presented in this thesis. Moreover, the stage working place selection algorithm and dynamic stage construction algorithm are implemented. Performance analyses and experiment results show that the model can reduce time and communication costs of stage submission, hence improve efficiency of fault-tolerant execution.3. Research on the fault-tolerant communication model and implementation of migrating instances.In order to implement fault-tolerant communication of migrating instance, we have studied the reliable communication model based on sevice domain and "postoffice-mailbox" mode. The communication model is tailored for mail submission failure due to the randomly moving of migrating instance. In this thesis, the definition of communication model and corresponding system framework are descriped. The naming and address-locating scheme is described. The main communication algorithms are proposed. Moreover, the model characteristics and communication efficiency are analyzed. The experiment results show that the communication scheme is simple, reliable and efficient.4. Research on state monitor and coordinated recovery model and implementation.In order to implement fault tolerance of state, we studied the collaborating monitor model of a collaborating and parallel process with multi-executing agents and a corresponding checkpoint algorithm. The model is tailored for inconsistent state and execution blocking due to exceptional state of a migrating instance. The collaborating monitor model and monitor management algorithms are presented. The information capturing and disposing process is described. Moreover, a checkpoint method based on the monitor model is provided. Performance analyses and experiment results show that the model performs a very effective monitoring to migrating instances and can recover from failure with consistent state by coordinating monitors.The main innovative contributions of this thesis are:1. In order to avoid execution blocking caused by failure of work place, a fault-tolerant execution stage construction model based on space replication is provided. The model can optimize efficiency of stage construction, reduce costs on time and communication, and improve usability of the model.The model can plan tasks execution of a migrating instance according to the executing ability of working place, avoid the unnecessary revisit to some working places, and lessen total running time of migrating instance; moreover, a method to evaluate working places called dynamic priority is defined. For a working place, its priority is distinct for different migrating instance at different time. The dynamic priority method can reflect the adaptability of a working place as the runtime environment of the migrating instance. In addition, the working place selection algorithm is provided to select the most perfect working places for a migrating instance, at the same time to lessen the communication costs of stage submission.2. Aiming at communication failure caused by network faults or randomly moving of migrating instance, a reliable communication model based on service domain is presented. Compare to traditional communication methods of mobile agent, the model is easy to use, reliable, efficient and adaptable to a larger system scope. The communication model divides the whole working places into several service domains, each of which sets a postoffice in which two mailboxes of migrating instance locate. Each migrating instance has two mailboxes, one is source mailbox, the other is active mailbox. The source mailbox locates at the home postoffice, while the active one travel along with the migrating instance. An address_book is set at postoffice to buffer addresses of communicated migrating instances to be used for forthcoming query. The model bears advantages as follows: (1) every instance has double mailboxes, while the hmb takes on a guide role, the amb is the actual component to receive messages, which ensures reliable submission with exactly-once property, reduces the bandwidth of triangular routing and overheads of register and deregister. (2) the transparent and efficient addressing strategy can decrease addressing time and lessen dependency to home, moreover enable more robust and scalable system. Original experiments show that the model can satisfy the requirements on reliability, adaptability and efficiency of migrating workflow system. Future works will focus on establishing a more secure system to be applied in a more general environment. (3) fault tolerant, the model can avoid message loss due to the randomly moving of migrating instance, and ensue the sequential and exactly once submitting of mail.3. Aiming at catching and resuming of exceptional state, a hierarchical collaborating monitor model (HCM~3) is provided. The model can get and dispose the state of migrating instance, avoid workflow failure caused by the death of migrating instance.The model looks upon monitor of the while workflow as a coordinated parallel process, dispatches multi-monitors to implement collaborated monitoring for all migrating instances performing a workflow, and implement the catching, disposing and resuming of exceptional state at different level through coordination of monitors. The model possesses merits hereinafter: (1) hierarchical. Monitors have hierarchical relations with one another, which can tailor monitor for different migrating instances, diagnose when exception appears, and coordinate monitor's work at different level. (2) parallel. Monitoring is parallel. The system state can be kept consistent through coordination among monitors. The model can avoid single point failure and promote efficiency of monitor. (3) reliable and high efficiency. The model is reliable since it disperses monitoring tasks into several monitors. At the same time, only one monitor is distributed to a migrating instance, hence it can lessen additional costs introduced by overabundant monitors.Since the migrating workflow is an emerging workflow research field, it is far from mature in both theory and applications. To further the study started in this thesis, the author proposes the following future works:1. The HCM~3 need furthering improvement. Now the HCM~3 is at its primary stage and not very mature because we set many assumptions when building the system and utilize a simple case to do experiments. The further work will polish the model with considering complexity and dynamic of application environment. In addition to qualitative analyses, we will process many thorough quantitative analyses on many profiles of the model to get more impersonal evaluation.2. Research on the target-oriented task decomposition and migrating instance execution scheme will be undertaken. In this thesis, the flow decomposition and instance dispatching scheme is only a direct and simple division of business flow, which is based on definite definition and complete knowledge about the business flow. The adaptability is embodied by binding the implementation details to migrating instance at runtime. The further work will study the target-oriented workflow scheme, which can lessen the dependency on the deviser's knowledge and awareness about the workflow, hence improve the usability.
Keywords/Search Tags:workflow management, migrating workflow, migrating instance, fault tolerance, state of monitor, recovery
PDF Full Text Request
Related items