Font Size: a A A

Research And Realization On Instance-based Data Matching In Semantic Web

Posted on:2009-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2178360242980630Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The World Wide Web (WWW) was developed by Tim Berners-Lee in 1989. After the development in the last several years, WWW has come into WEB2.0 from WEB1.0. In WEB1.0, people can only browse the websites, but now in WEB2.0, people can also publish some resources (html, photo, video, music etc…) to network. At present the WWW has been a huge global information repository. But the quantity of information in the network is becoming bigger and bigger so that it is difficult to users to find correct resources they want. So Tim Berners-Lee introduced the concept of semantic web in 1998. In semantic web, resources have some information about their meanings (semanteme) which computers can read so that these resources can be searched and dealt with automatically by applications or computers. In the conference at the Southampton University he also indicated that what the semantic web needed was that there should be a uniform format for each database to represent their data and merge them and make them be public. People would not know the advantage of the semantic web if the databases in the networks are separated from each other. The final goal of the semantic web is to make all the knowledge people have to be a huge network and make them possible to be dealt with by computers automatically.Generally, different databases in the network have different schemas and identifiers. In order to merge and integrate these different databases, people must know the meanings of the data in these databases. That means the interoperation between databases(structured, semi-structured or non-structured) is the key point in the semantic web. The goal of interoperation is to make data be used by applications or computers which are not their owners. Thus, just using ontologies does not reduce heterogeneity: it raises heterogeneity problems to a higher level, which are also the main problems that limit semantic web to go further. Matching is a promising solution to the semantic heterogeneity problem, it contains schema-based matching and instance-based matching. Many various solutions of matching have been proposed so far. But most of them concentrate on a schema-based solution, such as Lexicon-based matching approach, SAT-based matching approach, semantic matching approach, ONION System, LOM System and PROMPT System. But there is little concentration about instance-based solution.This thesis introduces a so-called Okkam system to solve instance-based matching and merging problem, which is proposed by Prof. Paolo Bouquet at the University of Trento in Italy.The main principle of Okkam is to make entities which represent the same instance resource in different databases to be identified by a global uniform identity. That means when integrating different databases, we can confirm some entities are matched by comparing their URIs. The overall goal of the OKKAM initiative at the University of Trento is to enable the Web of Entities, a global digital space for publishing and managing information about entities, where every entity is uniquely identified, and links between entities can be explicitly specified and exploited in a variety of scenarios. Compared to the WWW, the main differences are that the domain of entities is extended beyond the realm of digital resources to include objects in other realms like products, organizations, associations, countries, events, publications, hotels or people; and that links between entities are extended beyond hyperlinks to include virtually any type of relation.This thesis designs and implements the details inside Okkam System, including the data structure, how to search the entities, how to publish the entities, etc, then develops two applications based on Okkam system: Okkam4P and FOAF-O-Matic, and lists the results of making use of these tow applications, indicating that they can resolve the instance-based matching and merging problem effectively in semantic web.Okkam4P is a plug-in for Protégé. It essentially assigns a global unique identifier called ( Okkam ID ) to a newly created individual, rather than relying on manual input of the user or the standard automatic mechanism of Protégé.FOAF-O-Matic is a web-based application. The focal point of FOAF-O-Matic is to allow users to integrate Okkam identifier within their FOAF document in a user-friendly way. In this way, it will be possible to merge more precisely a wider number of FOAF graphs describing a person's social networks, enhancing the integration of information and reach more easily the goal of the FOAF initiative.Now, the development and research about Okkam system and two applications are in the beginning, there are still many points and problems to be solved. This is a promising field. And the realizing of Okkam system makes sure to accelerate the success of semantic web.
Keywords/Search Tags:Instance-based
PDF Full Text Request
Related items