Font Size: a A A

Study On The Knowledge Processing Based On Semantic Web

Posted on:2006-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L YanFull Text:PDF
GTID:1118360182965723Subject:Information Science
Abstract/Summary:PDF Full Text Request
Web has attracted and would attract more users due to its abundant resources and powerful functions, enabling network applications to really go into organizations and individuals. Nevertheless, the current search engine is based on keywords and returns unsatisfactory results. Nowadays, knowledge workers have to spend large amounts of time in browsing and reading information on the Web in order to find out how the documents are related to each other. Only when they find out the similarity and difference among information, knowledge workers begin to arrive at an essential time when they could construct relations to create new knowledge.Semantic Web is the extension of the existing Web. Nowadays, the form of Web content is designed for humanity but not machine to understand. Semantic Web content is designed according to a semantic mode in order that the content is not only understood by humanity, but also, more importantly, understood by machine. Semantic Web provides a powerful basis for the distributed resources on the Web to be integrated into the coherent and relative information. Each Web content might have a corresponding parallel semantic content. The semantic content is the description and representation of knowledge about the Web content and the relations among knowledge that might come from different data sources. In a sense, utilizing ontologies on the Web enables the development of Semantic Web. Domain ontology specifies the objects, relations among objects and properties within the domain and a body of formal knowledge about entities is called a knowledge base (KB). Knowledge processing in this dissertation not only means to represent, index and retrieve the knowledge about the Web content and the relations among knowledge, but also means to automatically interpret, exchange, integrate and infer the generated semantic content. Web contents also include Web service resources abounding on the Web despite text and multimedia resources.This dissertation studies the systematic relations of Semantic Web and knowledge processing comprehensively and deeply based on the precondition of sufficient review and analysis of research status quo internal and international. This dissertation utilizes Semantic Web technology to study the knowledge processing of three main resources of text, multimedia and service respectively, together with analyzing and comparing the performance of indexing and retrieval based on keywords with that of semantic indexing and retrieval. The whole dissertation is divided into five sections.The first section is Semantic Web's theory and technology. XML has brought hope to Semantic Web. Tim Berners-Lee, creator of Semantic Web, considers the object of Semantic Web as creating representative languages and describing information in the form of machine understanding. He summarizes the functional framework of Semantic Web as metadata layer,schema layer and logical layer. On the Semantic Web, XML plus XML Schema specifies the syntax, structure and data type, but lacks semantic constraints. RDF(Resource Description Framework) is a language for representing information about resources in the World Wide Web. It represents metadata of Web resources, such as title, author, modifying date of Web content, copyright and register information of Web documents, language, format, content items, and etc. RDF Schema does not provide a vocabulary of application-specific classes and properties. Instead, it provides the facilities needed to describe such classes and properties, and indicates which classes and properties are expected to be used together. OWL(Ontology Web Language) is designed to apply in not only presenting information but also processing the content of information. OWL provides more formal semantic vocabulary, exceeding XML Schema and RDF Schema on the aspect of machine understanding of Web content.The second section is semantic integration, indexing and retrieval. This section researches on semantic integration, semantic annotation and indexing, semantic retrieval, and Semantic Web documents search engine. Annotation is about assigning to the entities in the text links to their semantic descriptions. In traditional NLP (Natural Language Processing) and IE (Information Extraction), Named Entities (NE) are regarded as people, organizations, locations and others referred by name. Semantic annotation is a specific metadata generation and usage schema targeted to enable new information access methods and extend the existing ones. The provided annotation schema is based on the understanding that the NE mentioned in the text documents constitute the important parts of their semantics. Moreover, using different kinds of redundancy, external or background knowledge, those entities could be coupled with formal descriptions and thus more semantics about the Web are provided. In the process of semantic annotation, the formal encoding of extracted knowledge should be based on widely accepted knowledge representation and metadata encoding standards. Semantic retrieval schema should regard semantic tag as structured information and make inference based on RDF and OWL semantics. This dissertation puts forward a framework of Semantic Web search engine based on the current Web search engine and semantic Web. This framework should run some inference engine to recognize needed facts and rules to arrived at expected results, such as sifting needed facts and rules on the Semantic Web and merging the sifted results into inference process. Semantic Web comprises Web documents and parallel Semantic Web documents (SWDs) describing the Web documents. There exist large numbers of ontology and knowledge base in the SWDs. Reuse of ontology and knowledge base is an important field of knowledge technology, and its prerequisite is to locate and find these knowledge base and ontology. The specific engine of indexing and retrieving SWDs is called SWDs search engine. SWDs search engines utilize the fact that the documents they encounter are designed for machine understanding and processing. Swoogle is a successful SWSs search engine expected to be used to discover the fit ontology and instance data, and study the structure of Semantic Web.The third section is Semantic Web and multimedia. Multimedia content description is used to better organize and retrieve daily increasing multimedia resources. For a long time, the study focusing on features extracting technology enables the content description at the sensing layer. Visual and audio features are existing values and could be extracted, but describe multimedia content at a lower layer, such as color, shape, texture and etc. This kind of content description offends people's habit in processing multimedia resources. MPEG-7 is an important international standard in the field of multimedia content description. MPEG-7 could describe multimedia content at several granularity layers and abstract layers, including structures, concepts, models, collections, regions, segments, objects and events. MPEG-7 DDL (Description Definition Language) is based on XML Schema and makes some extensions to specify Descriptors and Description Schemes. XML Schema could represent needed syntax, structure and data type, but provides little semantic support for efficient and flexible mapping, integration and knowledge acquisition. Semantic Web could express semantics and semantic relations of multimedia through class and property hierarchies. However, Semantic Web's property-centricity makes it difficult to generate property definitions and domain constraints from the class-centric XML Schema definitions and thus constructing a complete OWL ontology reflecting MPEG-7 Descriptors and Description Schemes is not an easy thing to do. This dissertation probes into the possibility of expressing the semantics of MPEG-7 Descriptors and Description Schemes on the Semantic Web. As machine processable ontology, MPEG-7's semantic representation would enable the construction of multimedia system based on knowledge. This kind of system would be able to automatically extract and aggregate semantic information relating to audiovisual data, such as objects, events, properties and relations. The extracted semantic metadata could be used in categorization, indexing, retrieval of multimedia content. For example, under given inference rules, a well-structured ontology could infer the subject of an image or the type of a video based on the combination of MPEG-7 lower layer visual or audiovisual Descriptors.The fourth section is Semantic Web service. The core thought of Web service is to provide a kind of software service. Web service could be regarded as a technology providing an interface for a function in order that other programs could access the function on the Web. Web service is a component framework for Web applications to invoke and integrate each other on the loose-coupled Internet. Web service could be accessed through URL, but only knowing URL of a Web service is not enough to invoke it. The information of how to format request to enable Web service to do the expected work, what operation to support, what parameters to provide and what response to receive should be known. Moreover, the form of exchanged data including data type and usage sequence should also be known. WSDL (Web Services Description Language) could meet all of these needs. Nevertheless, Web service is based on XML Schema and XML Schema lacks semantic description, thus Web service is limited for people other than machine agents to locate and invoke. In order to realize machine agents' automatically discovery and integration ofWeb service, more semantic information needs to be described. Researchers have developed relative languages, frameworks and methods, especially the OWL-S (Ontology Web Language for Services) built on OWL. This dissertation researches into the discovery, integration and interoperability of Web service based on the practice of XML Web service.The fifth section is a case analysis of Knowledge and Information Management (KIM ) platform. KIM is an infrastructure providing the service of semantic annotation, indexing and retrieval. An essential idea in KIM is the semantic (or entity) annotation. It can be seen as a classical named-entity recognition and annotation process. However, in contrast to most of the existing IE systems, KIM provides for each entity reference in the text a link (URI) to the most specific class in the ontology and a link to the specific instance in the knowledge base. KIM could index and retrieve documents based on entities. KIM could accomplish (semi-) automatic semantic annotation, ontology population, semantic indexing and retrieval, and formal knowledge retrieval and navigation.Semantic Web is a globally distributed knowledge base. Tim Berners-Lee tries to create network knowledge ontology and he describes Semantic Web as an infrastructure that could be able to learn from experience and create knowledge acquisition, knowledge representation and usage under different application circumstances. Study on the knowledge processing based on Semantic Web would enable the organization, discovery, usage and integration of text, multimedia and service on the Web. From Apple's knowledge navigation in 1980s, humanity has been cherishing a dream of "knowledge roadmap". The global knowledge base brought by Semantic Web would leap forward this dream, providing for individuals, organizations and the whole society unpredictable benefits and future.
Keywords/Search Tags:Semantic Web, knowledge processing, ontology, multimedia content description, Web service
PDF Full Text Request
Related items