Process mining is also called workflow mining,which is a new application of data mining in workflow management area. At present, most enterprise has built ERP,CRM, SCM, WFMS and so on. The workflow logs recorded by these systems provide data foundation for workflow mining. Process mining targets the automatic discovery of information from an event log. This discovered information can be used to deploy new systems that support the execution of business processes or as a feedback tool that helps in auditing, analyzing and improving already enacted business processes.We distinguish three different perspectives: the control flow perspective, the organizational perspective and the case perspective.Current, most studies focus on the control flow perspective.The main aim of control flow mining is mining relations between tasks in an event log to find a good characterization of all paths in the log. Most algorithms have problems in dealing with duplicate activities, hidden activities, non-free-choice constructs, etc. In addition, real-life logs contain noise and are typically incompleteness. Most algorithms can't deal with noise and incompleteness. The main cause is that most of these algorithms are based on local information between activities in event log or the representations these algorithms use don't support these constructs. To resolve these problems, a kind of genetic algorithm for process mining has been proposed.this genetic algorithm use casual matrix to represent process model and this representation support all control flow constructs except duplicate activities.The fitness of the algorithm contains the completeness requirement and the preciseness requirement.This fitness can reflect how much a individual fit an event log and can well guide the gentetic algorithm. Although the genetic algorithm can mine all constructs except duplicate task, but it need many generations to search a better process model. So in this paper we put forward an improved genetic algorithm to improve efficiency. This algorithm can mine a better process model in same generation.According to the character of genetic algorithm,we combined five criterions for evaluating the quality of mined mode in process mining area to evaluate the results of our algorithm.The five criterions can exhaustively reflect the quality of mined process model.The main work and results included in this paper are as follows:(1)We use problems encountered during the log replay to select the crossover point.To crossover operator, crossover point is selected randomly in current algorithm. Problems encountered during the log replay could be used to select a good crossover point. Crossover should make the excellent individual's character be inherited in next generation. When doing crossover to two individuals, for excellent individual we select the activity as crossover point which has problem during using the individual parse the event log. the excellent character in the excellent individual can be keep down in next generation and character of activity which has problem may be improved.This,wo have a higher probability to get the most excellent individual.(2)We distinguish two kinds of dependencies: direct dependency and indirect dependency, we analysis the character of the two kinds of dependencies,and give out the method to identify the two kinds of dependencies.Direct dependency is based on local information in event log.if two activities have direct dependency,the dependency measure between the activities must be bigger than zero.if activity A and activity B has indirect dependency,when B appears ,A must appear before B and A can not be directly succed by B.(3)We use dependency relation to make a heuristics to guide mutuation.When we know all activities (Add) which probably has dependency relation with activity A.when do mutuation to activity A, if we select the operator inserting an activity to input/out of A.we can randomly select an activity from Add.if we do this, the search space will shrink.(4)We analysis time complexity of genetic algorithm and give out a realization of improved genetic algorithm which do not change the time complexity of original algorithm.Time complexity of genetic algorithm is determined by population size,maximum running generation and the event log size.To not add time complexity level of genetic algorithm,we compute the predecessor , the successor , the direct predecessor and the direct successor. When computing fitness,we compute the number of problem of every activity appearing in the time of using a individual to prase event log.(5)We have combined five criterions for evaluating the quality of mined model in process mining area to evaluate the results of our algorithm.Fitness of genetic algorithm in process mining area contains two parts: the completeness and the preciseness. The preciseness is detrtmined by the population the individual exist in. so the same individual in different population may have different fitness.so, using fitness to evaluate results of genetic algorithm is not reasonable. according to the character of genetic algorithm,We combined five criterions for evaluating the quality of mined mode in process mining area to evaluate the results of our algorithm. The five criterions are the completeness requirement (PFcomplete), the behavioral precision (Bp), the behavioral recall (Br), the structural precision (Sp) and the structural recall (Sr).The PFcomplete quantifies how complete a mined model is. The Bp and Br measure how precise the mined model is. The Sp and Sr express if the mined model has an overly complex structure or not. These metrics are complementary and should be considered together during the experiments analysis.The closer the values of the five metrics are to 1, the better. The five criterions can exhaustively reflect the quality of mined process model.At last,In ProM framework, we use forty event logs containing all kinds of control flow constructs to our algorithms. We set the same running parameter for original algorithm and improved algorithm and use the five criterions to analyze the result and make graph for the five criterions.the result show the improved genetic algorithm can mine a better process model. In process mining area,control flow mining is the hotspot. The mining of complicated control flow constructs and the tackle of noise in event log is the difficulty. the genetic algorithm can mine all constructs except duplicate task and be robust to noise .but the efficiency of the algorithm is low, it need many generations to mine a better process model.In this paper, we have improved the efficiency of the genetic algorithm and make it mine better prcess model than original algorithm. We combined five criterions for evaluating the quality of mined mode to evaluate the results of genetic algorithm. We have evaluated the quality the process model genetic mined not only from behavior perspective but also from strcture perspective, which make us exhaustively know the quality of mined model. |