Research On Deep Web Data Acquisition Based On Visual Information And DOM Tree

Posted on:2015-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:X H Li

Full Text:PDF

GTID:2268330428998402

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With rapid expansion of the Internet information in recent years, commercial value ofdata being continuously explored to provide value-added services. For example, opinionanalysis, meta search, comparison shopping, big data application and so on, most of themare based on the Deep Web data acquisition and integration. As more and morebackground databases appear which have high quality information and field-related, DeepWeb data acquisition and integration is still a popular research field.In order to retrieve tuples from the target database effectively, and extract structureddata from the dynamically generated pages, the main contents of this paper includes asfollows:1) In view of the query interface has multi-attributes and top-k features, first of all, webuild a data space tree model and pruning the tree by using the heuristic information.Secondly, we give a dynamic selection strategy for value of text field in mixed attributesinterface. Finally, this scheme can effectively improve data siphoning efficiency that isverified by experiments.2) In order to locate the main data area of Deep Web page automatically, this papergives a set of heuristic features and quantitative method, and puts forward a linearweighted method based on the quantized value to do main data region mining.3) In order to extract the search results, this paper proposes an algorithm namedblock-regrouping to do data record extraction, that utilizing the visual information ofsearch results page and DOM label tree of the page to compute the visual block similarity,then conducts experiments to verify the efficiency of this method.4) In order to shorten records extraction time with the same template, we propose amethod to generate a wrapper for the data source.5) On the basis of existing work, we design a Deep Web data extraction prototype system. Besides, this paper conducts experiments over controlled and real site databases toillustrate the feasibility of this system.

Keywords/Search Tags:

Deep Web, Data siphoning, Data region mining, Record extraction, Wrapper

PDF Full Text Request

Related items

1	Research On Adaptive Wrapper In Deep Web Data Extraction
2	Research On Deep Web Data Extraction And Refining Methods
3	Research On Wrapper Adaptation In Web Data Integration
4	Research On Key Issues In Deep Web Data Integration
5	Research Of Data Extraction Technology Based On Tag Tree From List Pages
6	The Application Of Data Mining Technology In The Medical Record Information Management
7	Research On Data Extraction And Schema Recognition On Deep Web
8	Research And Implementation A Wrapper For Web Data-Extraction Based On Ontology
9	Research And Application Of Data Mining Technologies In XML-Based Electronic Patient Record
10	The Research Of Data Cleaning For Data House And Data Mining