Font Size: a A A

An Approach For Semantic Annotation Based On The Edit Distance And The Google Distance

Posted on:2011-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:D M AiFull Text:PDF
GTID:2178360308958891Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The web has become one of the main ways for people to obtain information and services, which has also become an enormous knowledge base, along with the rapid development of Internet technology. At the same time, a lot of problems arise. First of all, the expansion of the web scale makes that the useful information on the web cannot be used effectively, so that, it is difficult for people to gain"knowledge". Then, the computer mainly plays the role of displaying information, and hardly processes the information, which results in the incapability of intelligent applications. As a result, the Semantic Web is proposed to make web documents have machine-understandable semantics to share and reuse various data in different platforms, so that computer can communicate and cooperate with people. So, the enormous potential of web can come out.The machine-understandable semantic information is produced and added to the web resources by the semantic annotation. Semantic annotations are to describe instance data in the webpage with ontology knowledge and map them into ontology classes. The existence of semantic annotation brings semantics for web resources. Therefore, semantic annotation is the foundation of the realization of Semantic Web.This paper begins with an introduction of Semantic Web knowledge, including the origination of Semantic Web, the architecture of Semantic Web, Ontology and so on. Also, there explains the meaning of semantic annotation, the cooperation between ontology and annotation, and the effect of semantic annotation for the realization of the Semantic Web.Then, it analyses and compares some semantic annotation tools. Aimed at the shortcomings and defects of these existed tools, it imports the thought of syntax and semantic analysis, and proposes an approach of semantic annotation based on the Edit Distance and Google Distance. This method is guided by domain ontology, taking the grammatical similarity and semantic relevance between web resources and ontology into consideration to measure how relevant the resources and the ontology are, thus it can tag resources with ontology concepts. Besides, it also processes the annotation results respectively: the undefined instances with high relevance weight will be added to the ontology to enrich it; the instances with low relevance weight will be added to the word filter table to help the preprocessing of other documents, they are both circular feedback processes.Secondly, when we analyze the traditional semantic annotation tools, we find those tools hardly ever annotate word documents directly. Therefore, after analyzed the features of Word documents, we improve the approach based on Edit Distance and Google Distance in this paper. So, it can directly annotate the word documents.Finally, in order to validate the performance of this method, the author has designed and realized an experiment in the wine domain. It testes the web documents and the word documents respectively. Experimental results show that this solution is feasible and effective, and have no restriction on the expression of document, and support the annotation for the Word documents, which makes up for the deficiencies of traditional annotation tools.
Keywords/Search Tags:Semantic Web, Semantic Annotation, Edit Distance, Google Distance
PDF Full Text Request
Related items