Font Size: a A A

Study Of Task Partitioning And Scheduling Methods Based On Node Context Of P2P Network

Posted on:2013-01-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y W SongFull Text:PDF
GTID:1118330374980717Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Workflow is an integrated business process, which can be completed automatically or semi-automatically by computers. It is a common concerned key research subject within the area of computer science, automatic control, advanced manufacture, and so on. The core technology of workflow is how to enable distributed computing resources and business activities to work cooperatively with each other through the workflow management system, and it can be extended to excavate the potential computing ability in the network.In the situation of workflow where massive data to be processing or complex workflow to be handling, such as communication, finance, insurance, weather forecast, earthquake prediction, surveying and mapping, exploration, economic operation statistics, scientific computing, data mining, and business analysis, etc., the efficiency of workflow execution is a general problem received considerable concerned. For example, in a billing system of the communication industry, the average amount of monthly calls of a medium province is about40-50billions. So it is of great work loads for centralized billing and charging off for the whole province at the end of the month. According to the above mentioned computing scale, only for the billing activity will last for about6hours even processing it in a computing environment with multiple high-performance servers. And for the whole process of call list collection, call list formation, billing, accounting processing, and so on, it will take about48hours. The direct impact of this problem leads to time consuming execution of workflows, and huge IT infrastructure investment demand. Therefore it is of great common realistic sense to research on improving workflow execution efficiency.P2P technology achieves the calculation model transformation from the lord-from type to pear to pear type. The core of network application diffuses from centered server to brink computing equipments, which effectively resolves the problem of single point failure and load balance to improve the whole network computing power greatly. Making full use of P2P network resources and computing power to decompose the workflows of great operation workload and schedule them to the preferred nodes for distributed computing. This will improve the using efficiency of available computational resources and the overall execution efficiency of workflows. Therefore, the Task Partitioning and Scheduling (TPS) in P2P network environment is a core problem of workflow research.The task partitioning and scheduling problem in P2P network environment has the following features:(1) the computing resources, communication ability, load condition of each node and the network topology structure in P2P networks are dynamically changing;(2) by the first feature it determines that we can't accurately calculate the task processing time of each node, so it is difficult to take the workflow execution time as the task scheduling objective;(3) the information of structured and decentralized P2P network nodes is locally visible. A better solution is to enable the workflow management system of perception of the node context in the P2P network, namely to study and solve the Context Based Task Partitioning and Scheduling(CB-TPS) problem. The CB-TPS problem can be broken down into three sub problems as follows:(1) task partitioning problem is to decompose the big work loaded complex activities into simple fine grit sub task set of small work loaded, in order to achieve the purpose of the computation reduction of single tasks;(2) the task scheduling problem is to assign the decomposed sub task sets to appropriate nodes for executing, in order to realize the global optimization goal of the workflow execution efficiency;(3) the result merging problem is about to merging the task execution results of each sub task set, to make it equivalent to the result of workflow execution before the task partitioning.This paper focuses on task partitioning and scheduling method in structured decentralized P2P network environment, the main results include:1. The P2P network node context based task partitioning and scheduling(CB-TPS) model.This paper introduces the P2P network node context into the workflow system model, which enable the workflow management system to sense the P2P network environment and dynamically accomplish the task partitioning and scheduling according to the node context for achieving the optimal operation efficiency. The basic idea of the model is to firstly get the computing power and communication ability of the nodes through the node context information, and then to decompose and schedule the tasks according to the node capability. The task partitioning can be carried out in two ways:data partitioning and function partitioning. When using the data partitioning method of task partitioning, we first estimate the nodes'computing power according to the obtained node context information, and then dispatch corresponding matching tasks to the node according to its computing capability. When using function partitioning method of task partitioning, we first use obtained the node context information to estimate the task processing efficiency, to set the directive factors, and so on, and then dispatch the tasks according to the estimated efficiency and the directive factors. Chapter2of this paper proposes definitions of the node context and the workflow automata, on basis of which bringing forward the concepts of context-affecting weighting factors, evaluation coefficients of the node computing capacity, etc., as well as theirs calculation method, and finally the formal description of the CB-TPS problem is established.2. The function partitioning based TPS method.The function partitioning based TPS method is essentially a functional subdivision of specific activities of the workflows according to the business logic. That is the specific activities of the workflows are refined into a set of sub task processes of the partial order connection, each of which corresponds to a particular business function. The scheduling of the partitioned task sub processes can be solved by executing the path planning, such as the list scheduling, task duplication scheduling, genetic algorithms, particle swarm optimization, the ant colony algorithm, etc. Chapter3of this paper introduces the node processing power evaluation coefficient on the basis of the P2P network node context definition to improve the ant colony algorithm. The ability evaluation index and the improved ant colony algorithm can not only help to obtain the parameters that are difficult to be estimated, such as the workflow execution time, but also make the task scheduling algorithm more dynamical and adaptive.3. The data partitioning based TPS method.The data partitioning based task partitioning method is essentially to subdivide the objects processed by the workflows, in order to reduce the scale of each sub task to improve the workflow execution efficiency. Therefore, the smaller grain sizes of the data partitioning are, the more P2P network nodes participate in computing, the higher efficiency of the workflow executions is improved. In chapter4on the basis of CAN network, we improve the flooding algorithm and K random walking resource search algorithm. Both of the two methods can rapidly complete the search of amount preset computing nodes, and at the same time of resource searching the processing power coefficients of candidate nodes are got for the task scheduling of the workflow management system. The workflow management system optimally chooses a certain number of nodes as candidate computing nodes for the tasks, and determines the task allocation intension for each node according to the processing power coefficient of nodes, according to which data is partitioned and dispatched to corresponding nodes for processing.After all the decomposed sub tasks executed, the result merging is done by direct combining the execution results of sub tasks. Because the task partitioning follows the non intersection principle of data partitioning, the direct merging of sub task execution results are equivalent to those without task partitioning. The merging method and equivalence proof are given in Chapter4.The main innovation of this paper includes:1. We build up the task partitioning and scheduling problem-solving model based on the P2P network node noumena and the finite state automaton of workflows.Compared with the key-value method, markup language method, and so on, noumena can better express the semantics of the P2P network node the context and the dynamic change of nodes. Comparing to the WF-Net and Petri Net, workflow finite state automaton can be more flexible in expressing the uncertain events in the P2P networks, and dynamically describing the dynamic partitioning and assembly processes of workflows. Therefore, the noumena and automaton based task partitioning and scheduling problem-solving model has good environmental adaptability of the P2P network.2. Based on the node context and function partitioning, we propose an ant colony optimization algorithm for task scheduling.The improved ant colony optimization takes the node processing power index as the stimulating factor, and regards the reciprocal of the task processing price as the pheromone density along the path left by the ants. It not only embodies the adaptability of the node context based task scheduling, but also better guides the scheduling algorithm to dispatching the tasks to the nodes with strong processing capability and low processing cost, in order to speed up the convergence to the optimal solution.3. We propose a task scheduling method based on the quick searching of node resources and data partitioning.The improved flooding and K random walking resource searching algorithm is able to quickly complete the searching of a certain amount of nodes for computing and the evaluation of the node processing capability based on the node context. The non intersection principle of the data partitioning can ensure the independence of the data subsets, and the evaluations of the node processing capability can ensure the tasks to be dispatched to the most powerful node set to improve the efficiency of task scheduling and execution.The workflow partitioning and scheduling in the P2P network environment is a researching topic that covers a very wide range, and the further works of this paper mainly include:1. The construction and optimization research of the context based P2P overlay network.The P2P network is a logic overlay network on IP network or other physical networks. Its topology isn't fully matched with the one of the actual physical network, which leads to that the P2P network routing is actually not necessarily the optimal routing. So we can optimally regulate the logical overlay network through the context to optimize the routing information.2. The supervision and control optimization research of the operation quality of workflows in P2P networks.In P2P networks, the environment of the workflows is complex, changeful, and heterogeneous. How to monitor the running status of the workflows in real-time and adaptively schedule them according to the node context information is an important technology to guarantee the stable operation of P2P workflows.
Keywords/Search Tags:Workflow, Task Partitioning, Task Scheduling, P2P network, Node Context, Automato
PDF Full Text Request
Related items