Font Size: a A A

Research On Resource Searching And Load Balancing In P2P Workflow Systems

Posted on:2011-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:L GaoFull Text:PDF
GTID:1118330332981353Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Workflow is the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules. Workflow applications are inherently distributed, or even decentralized. That is, workflow can be viewed as an efficient synergetic paradigm, where all kinds of computing resources distributed in the network can be organized by an automatic process to provide a consistent service. So workflow has been one of the most important domains of parallel and distributed computing during the past two decades.However, problems such as potential poor performance, lack of reliability, limited scalability, insufficient user support, and unsatisfactory system openness are largely ignored. These problems are mainly caused by the mismatch between the distributed nature of workflow applications and the centralized management which is used by the tranditional workflow systems based on client-server or migrating agent architecture. Instead, the peer-to-peer (P2P) infrastructure is used to provide genuinely decentralized workflow support, which removes the centralized data repository and control engine from the system. Consequently, both data and control are distributed so that workflow functions are fulfilled through the direct communication and coordination among the relevant peers. With the support of this approach, performance bottlenecks are likely to be eliminated while increased resilience to failure, enhanced scalability, better user support and a more open framework for service-oriented workflow. So P2P workflow systems have been recognized as one of the most strategic future directions of workflow systems.Based on the above discussion, the fact can be revealed that comparing with the centralized systems, the significantly improved performance of P2P workflow systems originates from transforming the potential bottleneck caused by centralized engine into the increased control complexity caused by the full-distributed self-orgnized architecture based on P2P networks. So the coordination efficiency among peers is the key factor restricting the performance of P2P workflow systems, and a kind of efficient peer coordination strategy is the core of supporting P2P workflow functions. The running time of workflow instance (WI) in P2P workflow system is one of the most important measurements of the coordination efficiency. From this perspective, the optimal path planning (OPP) problem which focuses on constructing the optimal running path of WI with the minimum running time is the key problem restricting the performance optimization of P2P workflow. In the P2P network with dynamic parameters and decentralized topology, OPP problem shows three obvious properties: firstly, the computing resource, load status, communication bandwidth and topology of network are dynamically updated; secondly, each peer only keeps the local view about directly connected neighbors and works with each other by means of self-organization, because for the decentralized P2P workflow systems, especially the complex systems with numerous dynamic participants, maintaining the global information in a peer is unrealistic; thirdly, according to the workflow model, there are determinate logical relationships among peers so that all of running pathes have determinate directions.These propertis mentioned above call for new approaches to a new type of OPP problem, which is refered to as distributed OPP (D-OPP) problem. In short, D-OPP problem is to find the global optimal running path from a decentralized and dynamically evolving network with large scale of peers, according to the local views. Because of the partially observable information of peers and determinate logical directions of D-OPP problem, the running time of WI mainly consists of not only the task execution time (TET), but also the peer search time (PST) which can't be ignored. Thus, D-OPP problem can fall into two sub-problems:firstly, decentralized resource searching problem for P2P workflow systems, corresponding to optimizing PST; secondly, decentralized dynamic load balancing problem, corresponding to optimize TET. The combination of the optimal solutions of above two sub-problems at each execution step yields the global optimal solution of D-OPP problem.This study focuses on the optimization problem of the decentralized resource searching and dynamic load balancing in P2P workflow systems. These research findings of this thesis not only guarantee the outstanding performance and flexible architecture of P2P workflow systems, but also powfully support the integration of distributed computing resources and the deployment of distributed computing systems in large application scale.The main works of this thesis are described as follows: 1. D-OPP problem formulation and analysis.The main characteristic of P2P workflow is the full-distributed control architecture, which can be viewed as a transfer process of WI among all of sets of services by means of self-orgnization according to the local logic relationship, based on service clustering. Thus, D-OPP problem is inherently hierarchical, which consist of inter-domain and intra-domain of set of services, repectively corresponding to the decentralized resource searching problem and decentralized dynamic load balancing problem. In chapter 2, the architecture of P2P workflow systems is described from the perspective of service clustering. Based on the architecture, D-OPP problem is formulated and analyzed by means of Markov chain, and the structure of the optimal solution is provided.2. Research on decentralized dynamic resource searching network for P2P workflow. Aimming at the decentralized resource searching problem, give the optimal solution with the minimum steps.The decentralized resource searching problem call for a new approach to location the exact service set the workflow process requires in P2P network without any routing center. For P2P network, the structured approaches have better routing complexity comparing with the unstructured approaches. So the structured network is introduced into workflow systems to organize the service sets as structured inter-domain network so that the optimal searching complexity can be satisfied. In chapter 3, propose a service addressed network (SAN) and a spanning graph location network (SGLN) based on logic mapping coding (LMC). SAN can be viewed as the superposition of multiple CAN networks and efficiently low searching complexity and bandwidth with 100% location precision, comparing with the representative P2P workflow networks. Based on LMC, SGLN achieves 0(1) routing complexity without any extra routing load and redundant topology by mapping the workflow model into the topology of P2P network, which meets the optimal solution of the decentralized resource searching problem. At the same time, SGLN presents multiple parallel running channels and the less space complexity.3. Research on P2P workload balancing network.To solve the decentralized dynamic load balancing problem, there are two key factors:(1) the foundation network which can reflect the load status of system in real-time; (2) the optimal randomized sampling algorithms based on the foundation network. Workload balancing network satisfies the first factor mentioned above, which directly support the efficient decentralized dynamic load balancing. In chapter 4, a P2P randomized network, called workload balancing network (WBN), is proposed, which avoids the network churn by mapping capacity of peers into the topology of random graph. Extensive simulations show that WBN exhibits three characteristics:firstly, capacity relativity of topology; secondly, high capacity aggregation; thirdly, small network diameter independent on system scale. These characteristics directly support the decentralized dynamic load balancing can converge at the global optimal peers with stable short random walks.4. Research on the decentralized dynamic load balancing algorithm. Aimming at the decentralized dynamic load balancing problem, give the optimal solution with the minimum mixing time (random sampling steps).the decentralized dynamic load balancing problem is solved mainly by a randomized sampling algorithm based on the foundation network, which has to satisfying three conditions:firstly, there is no scheduling center; secondly, least sampling steps; thirdly, converge at the global optimal execution peers with large probability. In chapter 5, based on WBN, a heuristic randomized sampling algorithm (HRS), called WBN algorithm, is proposed, which meets the optimal solution of the decentralized dynamic load balancing problem. Based on WBN network, WBN algorithm is designed to randomly sample the peers with less duty ratio which is the foundation of constructing heuristic factors to distribute loads. Extensive simulations show that this algorithm will converge at the global optimal peers with probability 1 and stable short random walks about 4 expected steps in large-scale networks with a wide spectrum of load status. And the global load distribution is in proportion to the capacity of peers, which leads to the optimal system throughput.5. Research on topology-agnostic unstructured path planning algorithm.Corresponding to two sub-problems of D-OPP problem, SGLN network and WBN algorithm respectively which all belong to structured approaches meet the optimal solutions by constructing specific topologies. But for arbitrary network topology, there is no unstructured searching or load balancing algorithms meeting the optimal solutions of the problems mentioned above. So a new type of topology-agnostic unstructured path planning approaches is required, which has to satisfy three properties:firstly, decentralized approach; secondly, topology-agnostic; thirdly, the optimized path planning. In chapter 6, a randomized token distribution (RTD) algortithm is proposed. RTD algorithm achieves one hop routing from the request peers to the set of target peers on any network topology by actively distributing token which can be view as the advertisements of services to the set of peers which request the services. Extensive simulations test and support the efficiency of RTD algorithm.6. The optimal solution of D-OPP problem.SGLN network meets the optimal solution of the decentralized resource searching problem, WBN algorithm meets the optimal solution of the decentralized dynamic load balancing problem, and the combination of SGLN and WBN provides the optimal solution of D-OPP problem, which achieves 5 hops routing complexity (1 hop on SGLN network and 4 hops on WBN network) at each execution step of P2P workflow system.The main innovations of this thesis are described as follows:1-Propose a structured P2P network, called service addressed network (SAN), to provide optimized decentralized inter-domain resource searching.Comparing with the representative P2P workflow networks, SAN efficiently lows searching complexity and bandwidth with 100% location precision.2. Propose a structured P2P network, called spanning graph location network (SGLN), which meets the optimal solution of the decentralized resource searching problem of D-OPP.Based on LMC, SGLN achieves 0(1) routing complexity without any extra routing load and redundant topology by mapping the spanning graph into the topology of P2P network. At the same time, SGLN presents multiple parallel running channels and the less space complexity.3. Propose a dynamic random network, called Workload Balancing Network (WBN). Based on WBN, propose a heuristic randomized sampling algorithm, called WBN algorithm, which meets the optimal solution of the decentralized dynamic load balancing problem of D-OPP. WBN directly supports the decentralized dynamic load balancing to converge at the global optimal peers with stable short random walks. WBN algorithm will converge at the global optimal peers with probability 1 and stable short random walks about 4 expected steps in large-scale networks with a wide spectrum of load status. And the global load distribution is in proportion to the capacity of peers, which leads to the optimal system throughput.4. Propose a randomized token distribution (RTD) algortithm to provide a topology-agnostic satisfactory solution of D-OPP problem.RTD algorithm achieves one hop routing from the request peers to the set of target peers on any network topology by actively distributing token which can be view as the advertisements of services to the set of peers which request the services.Scince the P2P workflow is a topic in terms of broad research fields, the following future works are proposed to further the study started in this thesis:Online paging problem. But considering the storage of the indices of massive data in limited space, an index selecting strategy has an apparent impact on the system performance, that is, the online paging problem. For all of searching algorithms of this thesis, Local index can be created to improve the searching performance and reliability of system.Topology matching problem. This problem refers to the mismatching between the topology of overlay network and physical network. For all of searching and load balancing algorithms of this thesis, better topology matching will significiently improve the performance of P2P workflow systems.
Keywords/Search Tags:P2P workflow, P2P network, decentralized resource searching, decentralized dynamic load balancing, optimization
PDF Full Text Request
Related items