Based On Tree Structures For Deep Web Data Extraction Research

Posted on:2008-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:X Wang

Full Text:PDF

GTID:2208360212486533

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of the NetWork, the quantity of the Web sites and the web pages growth of the explosion, huge information on the NetWork. The Web page form and contents are different because the developer are difference, this result in the Web data are heterogeneous. It is for this reason, automatically acquire usefull information and data be a very challenging task.The traditional Web search engines uaually find the static Web pages, in fact, the static pages are small portion of the Web pages on the NetWork. There are a lot of information are not found by the traditional Web search engines, this part of NetWork are Deep Web. We must submit forms and extract automatically correlative information from feedback web page. The information of Deep Web The Deep Web usually refers to the part that can not find, in particular those of the dynamic genetate pages. How to effectively use covert network of informationresources has become a problem worthy of study.This paper based on the Tag-tree to realize sample pages purification, generate extract rules, and extrate information from target pages. This page apply Tidy to transform HTML document to XHTML document. Base on this XHTML document, to find location of data by comparing Tag tree of similar pages, and then generate extract rules of target pages. From the root of the Tag tree by maching function iterative many times. Wapper is constructed by XSLT realization the information extraction.This paper apply matching Tag tree to find data items and extract data. Extract rules are parameters, extract information from target pages, the result store in XML document.

Keywords/Search Tags:

Deep Web, data extraction, Tag tree, XML, XSLT

PDF Full Text Request

Related items

1	Research On Data Extraction Of Deep Web Based On Visual Information And Tree Match
2	Research On Web Informaition Extraction Techniques
3	Research And Design, Based On Xml And Xslt, Web Information Extraction
4	Based On The Xml Deep Web Information Extraction System With The Initial Implementation,
5	Research On Deep Web Data Acquisition Based On Visual Information And DOM Tree
6	Research On Domain-oriented High-quality Deep Web Data Integration Techologies
7	Research On Web Information Extraction Technology Based On Deep Web
8	Study Of Web Data Extraction Based On Webpage Structure
9	Research And Implementation Of Data Model Transformations Based On XSLT
10	Research And Implementation Of Data Model Transformations Based On Xslt