| This paper introduces a Semantic Web based full-text search engine for XML documents. Not only the engine can search contents of the documents, but also it can search the construction, such as elements, attributes and the relations between them. It uses a reduced XPath syntax to search, so a corresponsive node or node set will be returned. It also searches into some files like PDF, JPEG and etc to find an embedded XMP packet, which is a XML document fragment. XMP is a metadata specification published by Adobe Corp. Originally the engine was designed to search on the Semantic Web and to find the XML documents containing DC, PRISM and XMP metadata. It also has an inherent extensibility to find other files containing other metadata. The system keeps a list of NS, and all the elements and attributes whose NS is in the list can be indexed by system. The system manager will configure the NS list to control that which element or attribute can be indexed. He can also configure to index all the NS, including the NULL. In this paper we firstly introduce the history of Semantic Web, its architecture and some key technologies like XML, RDF(S) and ontology. Some metadata such as DC, PRISM and XMP also be introduced, extremely the XMP packet technology. The XMP packet is a XML document fragment embedded into other files. Generally it is a description about the metadata of the host file. Secondly we introduce the knowledge of classical search engine technology on the classical Web, including classifying, performance guideline, important components and its trend. Lastly we introduce our research, design and implementation of the search engine mentioned above, also with some visualizations and improvements. |