Font Size: a A A

The Study And Implementation Of Data Integration Based On Data Service Matching

Posted on:2008-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S XieFull Text:PDF
GTID:1118360242464744Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Building the Data Integration System(DIS), which can provide support for data analysis and management's decisions by making fully use of dispersive heterogeneous data, has become a new research topic today. The goal of a DIS is to integrate data from various distributed heterogeneous data sources without affecting the operation of those data-producing applications. However, the design of DIS is a difficult and complex task, especially in quick evolving network environment with data increasing rapidly.After well studying the existing approaches and techniques of data integration,combined with some our engineering experiments, an intensive and system researching of data integration is made to develop a DIS which can adapt to complex network environment, and with good performance and scalability. The main contents and contributions of the thesis are listed here:1. A specification language for data integration(DISL) is put forward, and a heterogeneous data integration platform based on DISL(DISL-Platform) is developed. By mapping every its components to a correspond meta-graphical-unit, the DISL can support the definition of the integration process semantics, such as data extraction, data conversion and data merging, at a high level of abstraction. The DISL-Platform can be used to auxiliary construct DISL-Mediators in a graphical mode. A DISL-Mediator, which is a rounded DISL program package used to define the semantics of a specific process for integrating data from one or more data sources in the same local network, can be executed interpretively by an executing engine.The DISL-Platform has been integrated into a general DIS based on data service concept, in which the core techniques of DISL-Platform are used to help implement some important components at data source side. At present, this DISL-Platform has been checked-and-accepted by the Electric Power Corporation of ShanXi (SX-EPC), and having been test-running in several sub-corporation of SX-EPC, such as TaiYuan-EPC, YanQuan-EPC. Besides, the DISL-Platform has also been published as a National Software Copywrite( Register No:2005SR12507, Copywrite No:044008). 2. Proposed a scheme of multi-layer organizing structure for data in a Data Warehouse (DW). In addition to analysis-data layer, which existing in any DW, this scheme introduces a new data layer of normalized business-data. The business-data layer, in which the data schemas are designed in a normalized and non-redundant style, can be further divided into several sub-layers. While the analysis-data layer, in which the data are organized in analysis subject, may also be further divided into several sub-layers according to the data granularity size, and is allowed to lead-in some redundant data schemas for improving the data query performance. By the way, this scheme can effectively ameliorate the adaptive capability of DW to suit variable needs.3. Functionally extended a mobile agent platform, Aglets, to help implement the collection of scattering data. The extended-Aglets has now successfully been integrated into a DIS based on data service concept, in which the extended-Aglets is used to collect the result-sets of all data service cells(DS-Cells) executed in different network nodes. Some test results show that the mobile agent can significantly improve the performance in collecting dispersive data section.4. Based on the researches on description logics(DL) and DL reasoning algorithm, an Optimizing and improving algorithm for computing the concept architecture tree of ontologies is described. This algorithm can make fully use of the explicit knowledge told DL-Knowledge Base to decrease a great deal of the times of reason-computing calls. Besides, this algorithm may also help compute the super-concepts, sub-concepts, equivalence-concepts, disjoint-concepts and instances for a specified concept in concept architecture tree more conveniently.5. An intelligent data service match algorithm for retrieving the DS-Cells, is presented. This algorithm employs a hybrid approach that complements logic-based concept relation computing with syntactic similarity matching of concepts. The experimental tests show that this algorithm can work well, and can effectively give a good solution to the problem about the match-judgement between two concept sets.6. A new method for constructing DIS based on data service concept is proposed. This approach is able to implement a centralized management of distributed data by issue-registering all DS-Cells, which are come from different network nodes that can provide data resources, in data service center. By effectively interfusion the techniques of Semantic Web, OWL, and description logic, this method can work with intensively exploiting the data formal semantics and reasoning knowledge based ontology concepts to retrieve the matched DS-Cells in the data service register library. In this way, a dynamically data query processing is implemented.Based on this method, we have developed a relative integrity data integration prototype system, completed its major algorithm designing, debugging and some module implementation. The experimental results of measuring reliability and performance of data service matcher, which is a core component of the target system, show that the system can work reliably with a high flexibility and performance as well, and can support distributed heterogeneous data integration in an uniform, transparency mode.
Keywords/Search Tags:data integration, data service matching, semantic Web, DL-reasoning, OWL, data warehouse, Mobile agent, DISL-Mediator
PDF Full Text Request
Related items