Font Size: a A A

Research On The Efficient Retrieval Methods For Large-scale Cloud Workflow Model Repositories

Posted on:2017-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:H HuangFull Text:PDF
GTID:1368330512486007Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of cloud computing,cloud service platforms are widely accepted by more and more enterprises and individuals.The underlying cloud workflow systems accumulate large numbers of business process data.Retrieving and recommending the most similar process models according to the tenant’s requirements become extremely important,because it is not only beneficial to promote the reuse of the existing model assets,but also helpful to reduce the error rate of the modeling process.However,in cloud service platforms there exist a large number of tenant service systems with the same or similar business background and application scenarios.Consequently,cloud workflow model repositories always have many process models whose process nodes are always having the same or similar labels,and a lot of process fragments with the same substructure or similar behavior.Therefore,traditional retrieval technology is unable to retrieve the most similar processes efficiently.How to efficiently query large process model repositories in a cloud workflow system is challenging.In this dissertation,we firstly analyze the advantages and disadvantages of the existing process retrieval approaches;then,according to the characteristics of cloud workflow model repositories,a series of approaches are proposed to improve the efficiency of query on large process model repositories in cloud workflow systems based on graph structure and process behavior.Finally,a prototype has been implemented to validate the proposed methods.Specifically,this dissertation includes the following aspects:1)Propose an improved two-stage approach using the filtering-verification framework for process exact retrieval based on graph structure:a)construct a composite task index by combining the label,join-attribute and split-attribute of a task to improve the filtering capability;b)present a novel subgraph isomorphism algorithm based on task code,which utilizes both the neighborhood information and the structural features of a task node,to improve the efficiency of refining the candidate model set;c)conduct extensive experiments over synthetic and real datasets to demonstrate the effectiveness and efficiency of the proposed approach.2)Propose an improved two-stage approach using the filtering-verification framework for process exact retrieval based on behavior:a)by considering the time constraints of process nodes,an ordering relation calculation algorithm with time constraints is put forward to improve the filtering ability of the index;b)based on the ordering relations with time constraints between tasks,a process behavior similarity computing algorithm and a process behavior matching algorithm are proposed respectively;c)based on the above algorithms,a two-stage process retrieval approach based on behavior is proposed to promote the efficiency of the retrieval.Finally,experiments on synthetic and real data are conducted to validate the effectiveness and efficiency of the proposed approach.3)Propose a series of data partitioning based process parallel retrieval approaches by utilizing the above algorithms:a)to promoting retrieval efficiency further,two data partitioning modes,equipartition and clustering based partitioning,are proposed to divide large-scale process model repositories into small pieces to facilitate the parallel retrieval;b)based on the above two data partitioning modes,four kinds of process retrieval algorithms,static/dynamic parallel retrieval algorithm based on uniform/automatic clustering partitioning model sets,are proposed to accelerate the large-scale process retrieval;c)based on the large-scale simulation process model library and the actual cloud workflow model repository,experiments are conducted to evaluate the efficiency of four parallel retrieval algorithms.4)Propose a modeling language independent cloud workflow model(CWF).A CWF based process retrieval framework that meets different users’ retrieval needs(based on text/graph structure/behavior)and supports the retrieval of multiple model formats is proposed to retrieve large-scale cloud workflow model repositories efficiently.Finally,a prototype tool is implemented to validate the above algorithms.
Keywords/Search Tags:cloud workflow, process retrieval, process similarity, composite index, parallel retrieval
PDF Full Text Request
Related items