Font Size: a A A

The Research Of Data Extraction And Semantic Annotation In Deep Web

Posted on:2010-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y G WeiFull Text:PDF
GTID:2178360302961993Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network technologies, Internet is playing a more important role in our daily life then ever before. As an important part of the Internet applications, searching online has become the main way to obtain information for many people. At present, searching engine is the most commonly used tool. Limited by technology supports, most mainstream search engines can only return the surface of static webs, but can not directly acquire information from web database. So the research on how to utilize the resources in Deep web is meaningful.Deep web is defined as the aggregate of resources in web databases, which cannot be accessed by the hyperlink. A Deep web data integration system will has to be created to obtain resources in Deep web. The Deep web query result processing model includes two parts:data extraction and semantic annotation. Data extraction is the means of extracting information in a web page by some technical methods and saving information in XML format or relational schema, as the basis of further process. And the term semantic annotation indicates adding semantic annotation on the extracted data so which will be easily recognized by computers and obtain better utilization value.This paper performed the data extraction by using Xpath, and then proposed a Deep Web semantic annotation method based on Chinese part-of-speech and domain knowledge. In this Xpath extration method, XML standardization is firstly performed on the searching results; then the path expression of data to be extracted is created during traversing the XML document; finally the data demanded is extracted and saved in an XML format according to the expression created in the previous step. Semantic information will be attached to the data extracted by semantic annotation. A Chinese segment tool was utilized in this method to obtain the part-of-speech from the returned result of Deep Web query, based on which the mapping rules between part-of-speech combinations and their semantic meaning were built up. Meanwhile the domain knowledge was also used to carry out semantic annotation. The results of experiment show that this method can perform semantic annotation on the Deep web query results, so the effectiveness was verified.
Keywords/Search Tags:Deep Web, Data Extraction, Semantic Annotation
PDF Full Text Request
Related items