Domain-driven Web Resource Acquisition

Posted on:2011-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:X R Lai

Full Text:PDF

GTID:2178330338989199

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the Internet scales, the contents and formats varies from site to site. While the web resources are all projections of the real world, they are created by the resource creator such as web site editor based on the concept world of its own. So there should be kinds of different description for one certain thing because of the great diversity of opinions. On the other hand, the retriever tries to get as more as possible information about it from the vast web resource ocean. They are reverse processes.Deep Web content now takes more and more part in the web resources. Deep Web contents are located in dynamic pages which are generated through queries from a form table. Traditional search engine crawlers cannot get access to these contents because of the lack of link to the urls, making the valuable information unseen by the web surfer. Since early in the 21th century, many researches are done to improve this. To fully utilize the existing infrastructure, the researches focus on the web data crawling using GET method to post the form to the web server. But the data are independent of each other.This paper addresses the issue by introducing the domain ontology. A meta search is adopted to get relevant pages with a query form through web page analysis. With the knowledge of the specific domain and the pre-analysis, the form is filled and posted to the server. The web server then retrieves the data according to the query and constructs a web page source code to return to the user as a response. The target data is extracted through the response page using certain algorithms and tools. After that, the data is mapped to the domain schema with the help of web resource and ontology mapping rules so that they can be stored in database tables. On the other hand, different data can be connected though the domain ontology.A prototype system is constructed to verify the method described, and experiments were done to measure the performance such as the harvest rate and the number of queries sent to the server.

Keywords/Search Tags:

web resource acquisition, deep web, domain-driven, ontology

PDF Full Text Request

Related items

1	Research On Domain Ontology-Based Knowledge Acquisition And Reuse
2	Reserch On Domain Ontology Concept And Relation Acquisition
3	The Research Of Term And Relation Acquisition Methods For Domain Ontology Learning
4	Reserch And Implementation On Semi-Automatic Domain Ontology Acquisition Method
5	Research Of Collection Of Web Information Based On Domain Ontology
6	Research On Ontology-Driven Archival Domain Knowledge Resources Organization
7	Application Of Building Domain Ontology In Learning Resource Management
8	A Research On Methods Of Knowledge Acquisition From Domain-Specific Texts And Their Application In Knowledge Acquisition From Archaeological Texts
9	Research On Domain Ontology-based Semantic Retrieval Method And Its Application
10	Research On Method Of Data Sources Selection And Constructing Domain Ontology