Font Size: a A A

The Relevant Technologies Research On Deep Web Source Discovery

Posted on:2010-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2178360275958655Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With rapid growing of network and gradual expanding of information's scale,there is a mass of kinds of information in the web,most of them are high-quality structure information.In most instances,this information is stored in online databases,user could get them only by submitting queries in search interface,and we call them as Deep Web information.In order to provide high-quality search service on structured information,the first step is collecting and integrating this information.Then user could rapidly and accurately find this information which they need.And the first thing is discovering data sources when collect the Deep Web information,In this paper,we analyze and do research on the relevant technologies of data sources discovery,and propose the related algorithms and models.The main work of this paper including:(1) Refer to determination technology of Deep Web search interface.Deep Web search interface is the entrance of accessing the Deep Web information,discovering the data sources in fact is discovering the search interface.We proposed a search interface determination algorithm base on graph construct of Form feature.(2) Adopt a distributed crawl technology to resolve these problems in discovering Deep Web data source.Propose a framework and algorithm of crawler used in search interfaces determination.(3) Compare the advantages and disadvantages of different interface extracting technologies,and propose an extracting technology base on DOM tree.This method can solve the problems occurred in interface extracting preferably,making use of precisely location.(4) Preliminary process for crude data source,the main work is duplication deleting and gain the search Form set that contain non-duplicate.In addition,we verify the effectiveness of the method and technology proposed in this paper through experiment.
Keywords/Search Tags:Deep Web, Data Source Discovery, Feature Graph Structure, Search Interface Determination, Feature extraction
PDF Full Text Request
Related items