Font Size: a A A

Research On Issues Of Data Integration Technology

Posted on:2011-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X WangFull Text:PDF
GTID:1118360305456628Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, the development and application of computer technology speed up the development of information technology. Data integration is an important method of solving the problem of"isolated information island". A good data integration system can provide low price for user and run with high efficiency which is based on consistent data. So this dissertation is concerned with data inconsistency in data integration including the following two aspects: 1) Get consistent query results by solving inconsistent data in query results of data integration system. In this aspect, the importance of data source quality for user query results is studied and a solution is provided to solve data inconsistency. In addition, a strategy and an algorithm are provided under pervasive environment. Also a prototype system is designed and developed. 2) Data integration system collects inconsistent data in the system and the inconsistent data are solved by experts. In this aspect, the method of how to obtain maximum benefit to the system by ordering candidate inconsistent data sets for solution is provided.Specifically, the main contributions of this dissertation are as the following:1) Based on data source quality, a solution for data inconsistency in data integration is provided. Data source quality criteria are defined and a data model is designed for data integration. Based on this data model, a formal definition for data inconsistency in data integration is provided. In order to process qualitative and quantitative values of data source criteria, fuzzy multi-attribute decision making method is introduced to solve inconsistent data. And the experimental results show that our solution has good effectiveness. 2) Pervasive computing environments introduce significantly new challenges for data integration such as the dynamics of data source quality. According to the characteristics of pervasive computing environments, based on fuzzy multi-attribute group decision making method, a solution for data inconsistency in data integration is provided. Data source quality is considered from two aspects including data quality criteria and expense quality criteria. In data quality criteria, a new data source property"history credibility"is defined which represents how"right"the data provided by each data source in data inconsistency solution history. And it can be adjusted according to users'feedback. The first stage of our solution to solve data inconsistency is based on expense quality and utility function. In the second stage, fuzzy multi-attribute group decision making approach is used based on data quality criteria to select the appropriate data source whose data is selected as the solution. The experimental results show that our solution has ideal effectiveness.3) When inconsistent data are solved by domain experts in data integration system, in order to increase the efficiency of experts and obtain the most benefit for system, a solution based on value of perfect information is provided. A data model is designed to describe inconsistent data. Based on query result quality, system utility is estimated and formulas of system'benefit and value of perfect information are defined. According to the value of perfect information, all the candidate inconsistent data sets are ordered to obtain the most benefit for the system. Experiments obtain good experimental results.4) Based on our solution, a prototype system—Expo Data Integration System is provided. Based on service bus and XML, this system introduces credible data integration technology. The core of this system is schema-based integration. The data sources in the system are packaged by web service. In this system, whether to solve data inconsistency is selectable for user which increases the flexibility of the system. This system adopts approximate object-oriented data schema management method to describe and integrate data. Based on utility function and fuzzy multi-attribute group decision making method, inconsistent data are solved in this system.
Keywords/Search Tags:Data integration, Data inconsistency, Quality criteria, Fuzzy multi-attribute decision making, Pervasive computing, Fuzzy multi-attribute group decision making, System utility, Value of perfect information
PDF Full Text Request
Related items