Font Size: a A A

Research And Realization Of Data Integration Technology Based On XML

Posted on:2010-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhaoFull Text:PDF
GTID:2178360272996885Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and network, the extension database application requirement, and the change of the hardware of the computer, the database system also have developed from small to large-scale, from concentration to distribution. Besides, with the development of computer technology the office system of the facilities and enterprise also be upgrade .It always can be seen that there are many systems in the facilities and enterprise, and the databases in background are different. It will be more and more important that these systems communicate with each other, and can be transparent access by users.The transparent accessing has a great influence on the business management of the facilities and enterprise. Its target is implementation of data sharing under the heterogeneous environment of the loosely coupled, the difference of data formats, cross-platform, cross-boundary and so on, to effective use of resources, improve the performance of the entire system. Nowadays, there are many problems in data integration techniques used in heterogeneous environments, the main problem is the large coupling degree, expensive and complicated to implement. XML technology has brought new methods of implementation for data integration.XML itself has platform-independent, easy to expand, and better interactivity and strong semantic features, making XML data integration has become the fact standard. Based XML data integration model can more easily achieve on the description of heterogeneous databases and data conversion between data sources, in order to resolve the conversion relationship between current heterogeneous databases.At present ,some well-known database company and middleware company outside has developed corresponding middleware products,to resolve the issues of the heterogeneous data sources integration.And also have mature products.However,there are a large number of interfaces development workload and high cost .Currently, internal companies and universities have also begun to attach importance to research in this area.but there are very few complete data exchange and integration products and application. Currently,there are three types on data integration programs in the architecture, Data Warehouse,Federal Database,and Mediator/Rapper.Data Warehouse defined as adding another layer between the client and data sources,we call it data warehouse layer,used to store data to be integrated from various data sources, system provide query mechanism to the data warehouse.Federal Database defined as mapping data exchange format between data sources, a data source can access to information other data sources provided, in order to achieve interoperability between a number of independent data sources.Mediator/Rapper defined as creating a unified view between data sources and users to shield heterogeneity,then users can be transparent access to data sources. Integrated system only provides a virtual integrated view and the query processing mechanism of the integrated view. System should be able to automatic transform from the query request the user to integrate mode to data source.In this architecture, the middle layer does not actually store data.When the user query, the mediator will only send queries to the appropriate data source simply. In these methods,Mediator/Rapper is most widely used.Since this method does not need to store a large amount of redundant data,and to ensure inquiries to the latest data.Therefore it is more suited to high degree of autonomy, A large number of integrated,fast-changing integration of heterogeneous data sources.By understanding and learning of XML related technologies and data integration technology, and combining with specific projects,in this paper, we use XML-based data integration model,and research wrapper design and implementation. Wrapper is designed mainly by the three modules in this paper.There are wrapper generation module, synchronization database update module, results of mode conversion module. Wrapper generation module is designed to a data extraction module in this article.Its core work is that user extract the required data from the specified data source by the way user interaction with the system. In this paper, the data extraction module and the results of mode conversion module are merger into one module in the detailed design of the wrapper part, and the system directly submit XML data forms to the upper. The function of synchronization database update module is to provide users with an option, and this option is decided whether to update the data source to impact of the new generation of database.The function of Data Extraction Module is to obtain query results from the structured data source,yet the results of the query is the form of query result. Wrapper needs to exchange the data from the relational data model to XML data model.As a result of this project for the database of Technology Bureau are relational database MySql, the realization of the project in this paper is used in the form of homogeneous.However, taking into account of scalability and application of practical significance,we design and research for heterogeneous database in this paper.In this paper,we separate the query translation module which a general wrapper contains and separate study as query processing module.Query translation is to receive user queries and translate into the local query to data sources.As it shows to users a top view of XML of the wrapper.yet users send the overall view of the query to the wrapper system,System must be understand what data users need in data source by reading mode conversion template.Then issue a query to the data source,and extract data users require.Query translation can also be interpreted that facing with a large number of data sources,data integration system needs to provide a consistent query interface,and users explicitly put forward as long as the contents of query,don't have to care about how to obtain the query results.Compared with the traditional database, the main difference of the integration system is that users query to a virtual intermediary model.Here the meaning of virtual is that data dosn't store in this model,and it still stores in its data source as its respective model.Therefore, in order to get the user's query results,query processing in data integration system must be able to reorganization users' query based on the intermediary model,and rewrite into query based on a data source model.It need to describe the link between intermediary model and data source model.For the article arrangement,this paper first introduces the research background and significance of the subject and the work to be done in the paper. Then discussed the basis for data integration, including the relevance knowledge of XML and data integration.After that, highlighting introduce the main contents of this paper that the wrapper design and implementation. In follow, this paper analyzes the knowledge of data query and optimize . Finally, combination the application of the data integration in the information system platform, analyze the functions and implementation process of each module.The system achieve to provide a global model view to the user , it can be seen data from various data sources through the global view otherwise the data still stored in the local data source. Middleware is between the heterogeneous data sources and application program. Coordinate system of heterogeneous data sources downward, provide a unified data model and common data access interface for accessing integrated data applications upwards...
Keywords/Search Tags:XML, Data Integration, Wrapper, Query Processing
PDF Full Text Request
Related items