Font Size: a A A

Research Of Ontology-based Method For Semantic Heterogeneity Resolution In Data Integration

Posted on:2011-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:X H YanFull Text:PDF
GTID:2178360302497786Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid developments of database and network technology, a large number of heterogeneous data sources come into being. As the increasing requirements of comprehensive information utilization, it is an urgent need to address integration issues on these heterogeneous data sources. With the great development of middleware technology, heterogeneities on hardware platforms, network protocols, and operating systems is no longer the main focus. While the ETL and other new technologies make methods of data integration more diversification, people can get richer data, data quantity increasing dramatically. People come to realize that the general-purpose data integration tools ETL can only eliminate the heterogeneities in grammar level and structure level, lacking of effective capabilities to eliminate the semantic heterogeneous. Most traditional data cleaning technology usually achieves data processing on the level of text value, ignoring the semantic information that data itself contains, resulting in quality problem of data integration.Data is the information's carrier, the value of data lies in its own quality. High-quality data is the basic of getting meaningful results from data analysis tools such as data mining and OLAP. The decision support system constructed on the base of low-quality data is unauthentic. A large number of low-quality data in data integration has became the key limiting factor of data applications. Data quality issues have taken on increasing importance in recent years. How to integrate high-quality data become the problems that need to be solved quickly. In our research, we have discovered that many 'data quality' problems are actually 'data misinterpretation' problems-that is, problems caused by heterogeneous data semantics. Since autonomy of data source, heterogeneity associated with the domain field caused semantic heterogeneous in information resources. Semantic heterogeneities in data integration have become the current the biggest hidden trouble that will cause data quality problems. Traditional methods are over-dependent on the binding nature of data schema, without considering the semantic constraints in the field of information and ignoring the semantic information of the data itself. Therefore, how to introduce data semantic to improve data quality has become a new research site in integration processing.Based on the above, in this paper we proposed an ontology-based method to resolve semantics heterogeneous that will cause data quality issues in heterogeneous data integration.In response to this field, this dissertation research arranges as follows:First of all, in information-based context, we discuss the objectives of heterogeneous data integration and the meaning of heterogeneous data, analyze the traditional methods of data integration and the status of international research in ontology's application, summarize the shortcomings in the existing data integration method and demonstrate the possibility that ontology and other related technology can be used to solve the heterogeneous semantics.Secondly, we analyze heterogeneous semantics in databases and categorize the semantic conflicts arising from heterogeneous semantics. We summarize the study on ontology and related technologies as a theoretical basis for supporting research.Then, with domain ontology catching the common characteristics of the ER model, we propose an approach of eliminating schematic conflict. We describe the context environment of metadata in database conceptual model by formal method, and transform semantics information hidden in metadata into the attribute values of entity types, achieve conversion to the target schema. After eliminating schematic conflict, we combine the property that ontology has more powerful semantics expression than relation schema, so we create ontology that extending semantics of relation schema to detect and eliminate data level conflict.Finally, based on the methods we have proposed, we complete data integration of the rape-molecular database project, sloving data quality problems that caused by semantic heterogeneity effectively.
Keywords/Search Tags:Ontology, Data integration, Semantic heterogeneity, Schematic conflict
PDF Full Text Request
Related items