Font Size: a A A

The Integration And Repair Of Heterogeneous Event Data

Posted on:2016-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X C ZhuFull Text:PDF
GTID:1108330503456160Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, various of information systems are frequently used in large enterprises,which continuously generate heterogeneous event data. According to a recent survey,the data quality of this heterogeneous event data has two main problems: 1) duplicate events, which means that two events with different event names may represent the same business activity, and 2) missing events, which means that the event log generated by information system is inconsistent with the process specification which guides the generation of events as a group of constraints. without addressing these two problems, further applications such as query and analysis over the event data may not yield any meaningful results. In order to make information systems run in an expected and healthy lifecycle,it is suggested that the correspondences between events should be investigated, and the missing events should be recovered by finding a minimum repair that is consistent with the specification. However, due to its heterogeneity, matching and repairing on event data are highly non-trivial. This paper studies the approaches that can gradually improve the quality of heterogenous event data, depending on what kind of external information we can exploit. The main contributions of this paper are summarized below:? Where there is no external information,this paper presents a framework which can evaluate similarities among different events. It employs dependency graph with extended virtual event to solve the opaque name problem and dislocated matching.Moreover, iterative similarity function with estimation method is indeed a tradeoff between accuracy and time cost. A heuristic approach for matching composite events is also developed.? If interesting event patterns exist, this paper presents an event matching framework based on these event patterns. The normal distance is employed to evaluate different event mappings, i.e., the event mapping with the highest normal distance is the optimal event mapping. Besides the node and edge similarity that are often considered by existing methods, this paper considers the similarity of event patterns,which makes the normal distance more discriminative. The matching framework uses an efficient A* search algorithm and supports incremental matching in a payas-you-go style.? If event data has corresponding process specification, this paper presents a back-tracking method to recover the missing events. This method avoids enumerating all the possible occurrence order of the events within parallel structure, which beats the existing alignment approach on repair efficiency. Branching index, advanced bounding function and local optimality are also developed to prune those non optimal results.
Keywords/Search Tags:business process management, data quality, data integration, data repairing, heterogeneous event
PDF Full Text Request
Related items