Font Size: a A A

Research On Semantic Web-based Web Information Query

Posted on:2009-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ChenFull Text:PDF
GTID:2178360272976539Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The emergence of the World Wide Web changes the methods and means by which people get information. And with its fast development, information is growing exponentially, bringing at the same time more difficulties for people to retrieve quickly and accurately what they need as well as plenty of data. The means which people employ to retrieve over the data to get accurate efficient information becomes an urgent issue.The prevailing search engine by part solves the issue. However, the temporary search engine queries data mostly by the means of word-match, leading to both low accuracy and low completeness. Thus, it can not satisfy people who raise the ever-complicated query requests in both accuracy and completeness. The reason that the algorithms current search engine uses can not go beyond words-match is that data on the WWW is in shortage of formal structure and is lack of semantics.To solve the problem, Tim Bernes Lee proposed the concept of the Semantic Web, in which data is embedded with meta data(i.e. semantics),making it more structural and computer-oriented, which further makes it understandable and processable by computers. With the use of semantics, data processing does not rely solely on vocabulary but could also make use of semantics to enrich its methods and ways. Inspired by the concept of the Semantic Web, we add semantics to the original bare data to make it structural. First, this facilitates the use of the semantics, making query more precise. Second, because of the semantics of the data to be queried over, data in different vocabulary forms can be equivalent by semantics, making query more complete.Those two issues are just the shorts of the temporary search engine. So in this paper, we build query model from the Semantic Web. By building field ontologies, data can be added semantics. And the knowledge base is constructed with the data with semantics are extracted from those that are lack of structure. Computers will then query over the knowledge base to achieve the goal of higher accuracy and completeness.One important concept in the Semantic Web is ontology, and most researches on the Semantic Web are related with the ontologies. In the paper, we first introduces current situation about researches on semantic web-based information query and concepts and technologies related with semantic search such as ontology building languages, the construction of ontologies and the construction of knowledge bases. Then a field ontology-based information query model is proposed. And based on the model, a semantic web-based information query system SSED is implemented around a vertical web site. The method is to building a field ontology, based on which data are converted into ontology instances to form an knowledge base, then we perform the query on this knowledge base.When designed, the system is broken into several layers in order to decrease the complexity. After that, the system can be viewed as a combination of three independent modules: knowledge base, query module and user interface. Service interfaces are provided by each interface, and message calls are performed among the interfaces for communication. Among the modules, the knowledge base is used to store data to be queried, the query module is used to perform query over the knowledge base according to conditions passed from the user interface, and the user interface is used to communicate with clients: receive query conditions and display query results.The system is implemented in Java programming language, thus is independent of operating systems. So it can be deployed on various operating systems. In order to speed up query performance, because the Linux operating system could make full use of peripherals devices, the system is deployed on the Linux operating system, which performs excellent on the stability and security. And the security of the system and the stability of its run-time performance could also be strengthened.Protégéis used during the implementation to design ontologies. After the ontology is built successfully, its definition and instances constructed according to it are stored in an owl file, which is to be read by the API of Jena to store them into database to construct the knowledge base. Since the system is deployed on the Linux operating system, the MySQL database is chosen as the knowledge base repository.Query module is further divided into the Query Server which performs the specific query task and the HTTP Server which processes the query requests and query results. And the specific jobs of the Query Server is to listing to the query requests, to analyzing requests and to performing queries, so threads are used to implement those specific jobs to improve the concurrency of the system. As an open source Servlet container, Tomcat is chosen as the HTTP Server. And Tomcat performs very well when handling Servlets, and provides great flexibility for users to customize their own extensions. The specific jobs of the HTTP Server is to collecting user input, to assembling the input to query requests which is then sent to Query Server, to receiving query results and to sending back the results to users.Meanwhile, in order to further decrease the performance bottlenecks, Query Server and HTTP Server are designed in the form that they can deployed on different platforms. So the two modules communicate through sockets, in which data are represented in the popular XML format. In this way, there must be one mechanism to handling the conversion between the XML data and the object data used in the programs. JAXB is employed to define object data schema used in the programs to achieve the goal.Logs by which debugging is performed or improvement is performed are known to be one necessary part of a system that performs stably and is easy to maintain. And log4j is used in the implementation of the SSED to achieve that goal. Because log4j is an independent module, its use does not add complexities to the system.The system implements the semantic web-based information query, thus improves effectively the precision and completeness of the query. And the method overcomes the weaknesses of current search engines, therefore has a great impact on the future general semantic searches. Through building ontology-based information query models, future directions of information query can be shed on. And the further development and application of the Semantic Web is firmly based. But the reality of more general semantic query needs the implementation of the general ontologies and the maturity of the Semantic Web technologies.
Keywords/Search Tags:semantic web, ontology, search engine, information query
PDF Full Text Request
Related items