Font Size: a A A

The Knowledge-Based Enterprise Heterogeneous Data Integration

Posted on:2011-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:M D CaoFull Text:PDF
GTID:2178360308460927Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent decades, with the rapid development of technology and advancement of information technology, the amount of data accumulated has exceeded the total of past 5000 year. Also, volume of data collection, storage, processing and propagation is daily increased. To make use of existed data resources fully without repeated labor at data collection, enterprises need to carry out data integration so as to share the data among departments. Enterprise heterogeneous data integration is a technology of integrating distributed, heterogeneous information sources, allowing users to transparently access these data sources for the purpose of information retrieval, analysis and processing.Existing integration technologies can be divided into logical integration and physical integration. For the enterprise information, which emphasizes the analysis and mining information of commercial value on the basis of the accumulated data, it is more appropriate to take the physical approach. The most important technology during physical integration process is ETL (extract, transform and load). Currently there are a number of ETL products, which mainly based on graphical job configurations with embedded executable script, but they all show lack of "memory", "recommendation" and other intelligence support. In this paper, ontology and rule engine are introduced to study a Knowledge-based intelligent data integration solution.The thesis puts forward a data integration framework based on knowledge base. This framework connects data integration with intelligent technology such as knowledge base, inference engine and rules, and highlights the use of "knowledge" in the process of data integration. Firstly the framework structure is shown, and every component is introduced. Then we focus on the knowledge base design and analysis, including the semantic base, mapping base and rule base. In order to achieve the automation of schema mapping, a new algorithm is presented, which based on the history of mapping information and rules in rule base. Rule management is analyzed with a solution in data integration framework given to tackle rules, rule engine and rule management issues. Finally a personal information integration system is implemented under the framework proposed, incorporating the preceding studies and realizing integrated data query and visual analysis capabilities.Compared to existing data integration tools, the proposed data integration framework has the following advantages:constantly accumulating knowledge provides the basis for intelligent data integration; semantic base eliminates the difficult semantic conflicts in integration process; the semi-automatic schema mapping method saves time and manpower; a flexible rule configuration mechanism is provided; visual analysis function is provided.
Keywords/Search Tags:data integration, knowledge base, schema mapping, rule management
PDF Full Text Request
Related items