Research On Deep Web Search Interface And Search Result Extraction

Posted on:2011-06-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H B Zhang

Full Text:PDF

GTID:1118330332472842

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As the internet develops rapidly, there are amount of online Web databases which can be accessed. The information which is stored in those databases is called Deep Web, it is dynamically produced against the submitted query on the search interface, and thus the traditional search engine can't index those data. In order to make the user access the Deep Web information conveniently, Deep Web data integration turns to be an urgent problem in information retrieval.Understanding of Deep Web search interface is a crucial problem in Deep Web data integration, and based on the analysis of Deep Web data integration's research status, this paper addresses some crucial problems which are related with Deep Web search interface, concluding the proposing of Deep Web Domain Model, Deep Web search interface discovery and schema extraction, and search result extraction and annotation. The main contributions and innovations include:â—This paper proposes a Deep Web Domain Model which is based on the research of Deep Web search interface. The Deep Web Domain Model contains all of information of the Deep Web interfaces belonging to the same domain. This paper analyzes the feasibility of Deep Web Domain Model theoretically, and gives the methods for construction and storage of Deep Web Domain Model. The Domain Model can be used in many problems of the Deep Web data integration, and makes the system create a breakthrough.â—This paper proposes an approach of Deep Web search interface discovery called PostClassifier which is based on Post-Query. PostClassifier first filters the interface by the rules produced by the Pre-Query approach in order to reduce resource consumption of query submitting. Using the Domain Model to juge the domain of the interface and fill the key words. PostClassifier proposes the method for identifying the interface's type based on the analysis of query result of different kind of interfaces. â—This paper proposes an approach of interface schema extraction which deals with labels and elements separately for the first time. At beginning, we construct a label tree for the interface, in this step we find the corresponding node in the Domain Model for each label, and need to deal with the repeated labels and lost labels. Then we find the elements'matched labels. We use the label's corresponding node in the Domain Model to match the element, in this way, more information can be used to find an element's label. If the lost labels have matched elements, they will also be dealt with correctly when merging the results of the previous two steps to get the final interface schema.â—This paper proposes an approach of search results extraction and annotation called EaSd. EaSd uses the VIPS as the HTML page's presentation format. The query keywords tend to emerge in the query results, based on this, EaSd discovers each record, and further discovers the record block. EaSd aligns the data units of all the records to find their common patterns or features, which will be helpful for annotation. Using both Domain Model and local interface schema for annotation will resolve the problem of local interface schema inadequacy and inconsistent label. We use several methods for annotation to improve the recall and precision. Experimental results show that EaSd can discover and annotate most records.

Keywords/Search Tags:

Deep Web search interface, domain model search, interface discovery, search interface shema extraction, search result extraction and annotation

PDF Full Text Request

Related items

1	Deep Web Interface Discovery And Extraction Research Based On Rules
2	Research On The Deep Web Search Interface Identification And Extraction Technology
3	Automatic wrapper generation for the extraction of search result records from search engines
4	Deep Web Interface Discovery Based On Domain Knowledge
5	Research On (Situation Map) Interface Complexicity Of Search Task Based On Visual Analysis Of Eye Movement Data
6	The Study On Deep Web Interface Integration And Search Strategy
7	The Relevant Technologies Research On Deep Web Source Discovery
8	Research And Integration Of Knowledge Acquisition System Based On Meta-Search
9	Research On APK Crawler With Automatic Pagination Detection And Search Results Extraction
10	Study Of Search Method Based On Group Characteristics