| With the rapid development of Internet, it becomes more and more urgent for the enterprises to be of information; one particular important aspect is the management of business data. A widely attention has been induced on how to get useful data from different-structured environment and to synthesize them and use them, namely the construction of data integration for heterogeneous data. The thesis, on the basis of the synthesized analyses on current theories and methods for integrate heterogeneous data, highlights the solution to the crucial problems in the system.The thesis first gives the integration model of heterogeneous data. The extraction process is driven by expectation and requirement of users to integrate data. When source data is inserted into the system, only the data source description stored in the metadata database is inserted, not the real source data. At the same time, the model integrates all the databases with JDBC/ODBC interface, future data warehouse with JOLAP interface, file system of all kind and data from web pages.Successively, it designs and realized the integration mold of different-structured database system and file system, providing a common data access interface, greatly improved the integration ability of the system.Then, based on three research hypotheses on web data integration, the thesis presents the generation process of rule tree, which contains functions like pre-processing, producing HTML tree, producing model tree, acquiring mapping rules, producing rule tree, maintaining rule tree and implementing Wrapper. Due to the frequent changes of WebPages, the paper advocates a process to maintain Wrapper, which contains functions such as discerning data feature, defining semantic block and repairing rule tree. It has been verified by test data that this method is fit for web data extraction.In the end, the thesis concludes the research on the realization of heterogeneous data integration system, and explicates the further workon it in the future. |