Font Size: a A A

Research On Middleware Of Heterogeneous Data Integration Based On XML Schema

Posted on:2012-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhangFull Text:PDF
GTID:2178330338997096Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of computer technology, network technology, information technology and the rapid decline in the cost of electronic product, the informatization of enterprise was implemented rapidly, in the process of informatization, companies accumulated a large amount of data. Because of the complexity of changing market conditions and fierce competition among enterprises, companies need to access multiple data sources to improve their own competitiveness. Due to various reasons, these data sources have heterogeneous features, so it is in an urgent need to effectively resolve the differences and to provide convenience for enterprises to access heterogeneous data sources.The typical solution is to extract data from each data source, convert the data format and then store data into a central database, operate the central database instead of each data source. But this solution would lead to a large amount of redundant data storage, which affects the business agility. An alternative is to integrate heterogeneous data source utilizing the database server and virtual view, however, the inadequacy is the ineffective treatment about XML data or unstructured data. How to provide an efficient, scalable and reliable heterogeneous data integration middleware which gives the uniform interface to other information systems is currently a hot spot in data integration research.This thesis analyzes the major issue on heterogeneous data integration field, and proposes the idea of classifying the data sources according to the characteristics of the data source storage structure. This solution divides data sources into three types: database, XML and unstructured data sources, and uses a uniform method to integrate each type of data source according to their characteristics: it uses database server to handle database data sources, uses XQuery processor to handle XML data sources. As for unstructured data source, it converts the unstructured data to XML data and then handle the XML data source. By taking advantages of XML schema in description of data, the metadata can be extracted and then converted, and then a virtual database according certain rules can be built to enable the data sources to be presented to users in a uniform global view. This article designed HDAM(Heterogeneous Data Source Access Middle) for developers as reference of the character of JDBC can access to different databases, described the role of each functional module and operation process, described the data source registration interface and the user interface, researched the algorithm of global query decomposition and local query conversion. At last, the middleware was developed and tested, and an application process of HDAM was demonstrated through a case. The result of test approved the feasibility and correctness of the middleware designed in the research.This subject researched heterogeneous data integration based on the idea of data source classification, proposed the idea of using XQuery processor to deal with XML data in middleware, hence improved the query efficiency of middleware when handling XML data and unstructured data, and enhanced the scalability and access rate of the integrated middleware.
Keywords/Search Tags:Heterogeneous Data Integration, XML Schema, Middleware, Virtual Database, SQL
PDF Full Text Request
Related items