Font Size: a A A

Effective entity resolution methodology for improving data quality and reliability of service-oriented applications

Posted on:2015-04-01Degree:Ph.DType:Dissertation
University:State University of New York at AlbanyCandidate:Musial, EwaFull Text:PDF
GTID:1478390017490881Subject:Computer Science
Abstract/Summary:
This dissertation proposes new paradigms for improving the testing, reliability of service-oriented applications as well as the quality of data. Since it is difficult to track information flowing through the multiple tiers of an application, testing service-oriented systems can be very challenging. We present a methodology for testing service-oriented applications that takes into account all the components, including services, external services, and data components. The results of our experiments demonstrate that this approach greatly improves the effectiveness of testing service-oriented applications.;To examine the effects of invalid data on the reliability of services and service-oriented applications, first we developed an approach to quantify the quality of database relations and compute the quality of data components based on the type of interactions they have with software components. Then, we developed a methodology that incorporates data quality into reliability modeling and can therefore better account for the failures caused by invalid data. Our empirical results show that our model provides more accurate estimations of reliability of service-oriented applications than traditional approaches by detecting 16% more faults.;Recognizing the importance of data quality, we developed an Entity Resolution (ER) algorithm that not only provides a blocking scheme, but also a 2-stage comparison selection process. It efficiently removes oversized blocks and identifies comparisons that are most likely to contain duplicates. Our empirical results demonstrate the usefulness of our algorithm with respect to both the efficiency and effectiveness. For the former, our algorithm reduces the number of comparisons that need to be resolved. For the latter, it increases the number of detected duplicates.
Keywords/Search Tags:Service-oriented applications, Data, Quality, Reliability, Methodology, Testing
Related items