Font Size: a A A

Domain-driven Web Resource Acquisition

Posted on:2011-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:X R LaiFull Text:PDF
GTID:2178330338989199Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the Internet scales, the contents and formats varies from site to site. While the web resources are all projections of the real world, they are created by the resource creator such as web site editor based on the concept world of its own. So there should be kinds of different description for one certain thing because of the great diversity of opinions. On the other hand, the retriever tries to get as more as possible information about it from the vast web resource ocean. They are reverse processes.Deep Web content now takes more and more part in the web resources. Deep Web contents are located in dynamic pages which are generated through queries from a form table. Traditional search engine crawlers cannot get access to these contents because of the lack of link to the urls, making the valuable information unseen by the web surfer. Since early in the 21th century, many researches are done to improve this. To fully utilize the existing infrastructure, the researches focus on the web data crawling using GET method to post the form to the web server. But the data are independent of each other.This paper addresses the issue by introducing the domain ontology. A meta search is adopted to get relevant pages with a query form through web page analysis. With the knowledge of the specific domain and the pre-analysis, the form is filled and posted to the server. The web server then retrieves the data according to the query and constructs a web page source code to return to the user as a response. The target data is extracted through the response page using certain algorithms and tools. After that, the data is mapped to the domain schema with the help of web resource and ontology mapping rules so that they can be stored in database tables. On the other hand, different data can be connected though the domain ontology.A prototype system is constructed to verify the method described, and experiments were done to measure the performance such as the harvest rate and the number of queries sent to the server.
Keywords/Search Tags:web resource acquisition, deep web, domain-driven, ontology
PDF Full Text Request
Related items