Information systems such as ERP, CRM and Workflow Management Systems have been used by companies or enterprises to support the execution of their business processes. These information systems typically support the record of event logs which provide detailed information about the execution of processes. One of the purposes of process mining is to discover the process model from an event log. The discovered model either shows the connection of reality and original model, or indicates the conformance of the process and the event log, which can be used to discover, monitor and improve the original process.Process mining can be divided into three types:process discovery, conformance checking and process enhancement. Based on the types of data in an event log, different perspectives of process mining can be discovered. The main three perspectives are control-flow perspective, case perspective and organizational perspective. This thesis focused on the control-flow perspective process discovery. By taking into account the typical limitations of existing mining algorithms, the main contributions of this thesis include the following contents:1. Current process mining algorithms all based on complete event logs. Therefore the performance of one algorithm is affected by the quality of the event logs. Based on the incomplete event log, this thesis proposed an Expectation-Maximization approach to estimate the first-order Markov model that represents the process model extracted from an unlabeled event log where the case id is missing. Based on complete log, the first-order Markov transition matrix of the log was established, and the ordering relations identification rules has been defined, based on these, the structures mining algorithm has been developed.2. Based on the limitations of the α-algorithm, an improved a-algorithm was proposed. The method analyzes the specifications of invisible tasks, short loops, and repetitive tasks based on synchronization manager workflow net, and then provides the judgment theorems of these workflow patterns. Based on this, the enhanced α-algorithm and its theorem proving have been defined.3. According to the drawbacks of Genetic process mining algorithm, two improved genetic algorithms have been defined. Firstly, an improved genetic mining algorithm was proposed which includes new definitions of fitness function, specific crossover and mutation operators. Based on the pre-processing of the log and post-processing of the mined model, the quality of the mined model have been improved. Secondly, according to the low efficiency and high resource consumption problem of the Genetic mining algorithm, an asexual reproduction based pseudo-parallel genetic algorithm was proposed. The asexual reproduction avoids crossover operators destroying nice gene patterns and the pseudo-parallel algorithm employs island model for information exchanging between subgroups.For verifying the feasibility of the methods and technologies proposed in this thesis. the improved algorithms have been implemented as plugins of the ProM framework. and the Markov based algorithm was realized through a mining prototype system. A lot of synthetic logs were used to evaluate the effectiveness of the proposed algorithms. |