Font Size: a A A

Design and construction of an entity resolution system that supports entity identity information management and asserted resolution

Posted on:2012-03-05Degree:Ph.DType:Dissertation
University:University of Arkansas at Little RockCandidate:Nelson, Eric DerrandFull Text:PDF
GTID:1458390011452405Subject:Information Technology
Abstract/Summary:
This work describes the design and construction of an open source, entity resolution system that enables users to assign and maintain persistent identifiers for master data items. Two key features of this system that are not available in current ER systems and that make persistent identification possible are (1) The capture and management of entity identity information (2) Support for user-directed asserted resolution to complement automated direct matching and transitive closure;Another important feature of the design is that the system can be easily configured at runtime into any one of four types of entity resolution architectures including (1) Traditional merge/purge, also known as, record linking (2) Identity Capture (3) Identity Update (4) Identity Resolution.;Because these configurations can be established by the user at run-time, the system provides a valuable tool for academic research and instruction. This will allow researchers and students to use the same system to explore the behavior and nature of different ER architectures. Even though the most common string-match comparators have been built into the system, such as, Levenshtein Edit Distance, Q-Gram, Soundex, and many others, the system has been designed to allow users to easily add additional comparators by extending the systems Comparator class. Furthermore, the system incorporates a dynamic filtering system that improves the performance of the matching algorithm by avoiding record pairs that cannot possibly match.
Keywords/Search Tags:System, Entity resolution
Related items