Font Size: a A A

The Key Technology Research Of Deep Web Data Integration

Posted on:2013-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2248330374479792Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the Internet dynamic Web technology rapid development, more and more information were stored in the online database Web at the backstage, the information cannot be crawl by the traditional Web crawlers, but only can through the query interface to access, the traditional search engine cannot index them,which makes a lot of useful information cannot be easy and fast for users to use, and this part of the information is called Deep Web.Deep Web has information, high quality and Strong thematic content, so the Deep Web information integration technology research has been more and more attention to domestic and foreign researchers.In this paper,in the process of Deep Web information integration three key technologies were in-depth which were Deep Web entry found,Deep Web query transformation and Deep Web results extraction. Specific content and work includes the following points:(1) Domain OntologyOntology as a kind of knowledge representation to apply to the major field of study, this paper use domain ontology increase the accuracy on the process of the entry found and the query transformation. And for the establishment of the domain ontology, this paper using Deep Web entry pages of manual collection as sample, and with these inquiry attributes of the page to establish domain ontology,this paper established the domain ontology can directly describe Deep Web query interface attribute informations,but due to the lack of support of the field experts,the domain ontology is not comprehensive, so in this paper the domain ontology will be automatic expansion In process of using.(2) Deep Web entry foundAccording to research the Deep Web entry pages, In this paper,a new method of entry found was been put forward, this method in topic crawler to add Form found module and entry found module, for topic crawler this paper choose bayes classifier let crawler always crawl the related pages, and to Form found module, in the process of crawling to determine whether there is a form tag in this page, if found the Form tag,using the domain ontology to check this page’s form attribute information.(3) Deep Web query transformationAccording to the research of the query translation, this paper proppses to use the attribute matching table to speed the attribute information to match.In this method, the query content will be the first to match in the attribute matching table, if matching success,the query content directly turned to local inquiry interfaces, no success let attributes and ontology to match, so as to simplify the query transformation process.(4) Deep Web result extractionIn the Deep Web results extraction study, this paper use DOM tree and page information matching module technology to find results information on the page, according to the observed that on the head, bottom and side the layout of the results page the content is the same or similar, and only the results on the pages changed, and style did not change.So the page can be set up to DOM tree, and use the information matching technology to cut the same content on the DOM tree,finally we can found the result informations in DOM tree and then extract them.
Keywords/Search Tags:Deep Web, Ontology, Entry found, data integration, QueryTransformation, DOM tree
PDF Full Text Request
Related items