Research On Query Planning For Deep Web

Posted on:2013-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Wang

Full Text:PDF

GTID:2248330377958802

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Nowadays, a popular trend in data dissemination involves online data sources which areknown as the Web databases that are hidden behind query forms, thus forming what isreferred to as the deep web. As compared to the surface web, where HTML pages are staticand data is stored as document files, deep web data is stored in backend databases. DynamicHTML pages are generated only after a user submits a query by filling an online form.According to the statistics of BrightPlanet Company, the amount of the data which DeepWebdatabase stores is500times than the amount of static pages’s data, the number of such datasources is still increasing rapidly every year.Therefore, the research on the Deep Web isessential and significance profound.Due to the scalable, autonomous, heterogeneous anddynamic nature of Web databases, besides, sources may have diverse and limited querycapabilities,query processing in Deep Web data integration is more challenging comparedwith that in traditional distributed environment. To deal with source autonomy andheterogeneity, the paper presents a method to describe data sources.How large is this vocabulary? To answer the puzzle,We performed an informalsurvey:Using search engines(eg,google.com) and Web directories(eg,invisibleweb.com),wocollected a total200sources,with50in each of Movies,Books, Automobiles andMusicRecords domains.Our survey found that while sources proliferate, their aggregateschema vocabulary tends to converge at a relatively small size. Inspired by the result ofsurvey, we create inverted indexing for each vocabulary.Besides, we alse present a modularscheme for generating efficient feasible query plans for target queries. Five modules worktogether to achive these tasks: expansion, pretreatment, rewrite, searching relevant datasources and generate modules in detail;We describe an algorithm for effectively generatinglogic plans based on the inverted indexing and an algorithm for finding an executableordering for logic plans.In this paper we alse show that because sources have restrictions on retrieving theirinformation,sources not mentioned in a logic plan can contribute to generate efficient feasiblequery plans, since they can provide useful bindings. We show in which cases these off-queryaccesses are useless, and prove that in these cases we can generate efficient feasible query plans by using only the sources in a logic plan. In the cases where off-query accesses arenecessary, we propose an algorithm for finding all the useful sources for a logic plan.Experiments show that our algorithm of generating executable query plans has goodefficiency, accuracy and scalability.

Keywords/Search Tags:

Web database, query capabilities, feasible query plans

PDF Full Text Request

Related items

1	Research On The Query Information In The Chinese Query Sentences Of Database
2	The Research And Application Of Database Value-based Query Optimization
3	Research On Key Technologies Of Distributed Rank-aware Query Processing
4	Research On Data Query Processing And Optimization In Distributed Database
5	Based On The Keyword Query Database Search Engine System Design And Implementation
6	Based On Historical Query Relational Database Query Optimization Keyword Research Questions
7	Study And Implementation Of Query Mechanism In EDBMS
8	Research Of Query Rewriting And Multi-join Query Optimization Based On GA Of Database
9	Research Of Query Rewriting And Multi-join Query Optimization Based On Ga Of Database
10	Research On Data Query Optimization Based On XML Database