Font Size: a A A

Research On Database Discovery And Selection In The Deep Web

Posted on:2010-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhaoFull Text:PDF
GTID:2178360278472620Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the growing maturity of network technology, the rapid development of Web makes itself is becoming a huge and heterogeneous data repository. According to the depth of data stored in Web, Web can be divided into two parts, Surface Web and Deep Web. The Deep Web is believed to contain 550 times more data than the Surface Web, and its capacity is increasing rapidly. However, the Web databases can only be accessed through the Web query interfaces provided by them. As a result, the information in Deep Web cannot be indexed by traditional search engines, such as Google, Yahoo, etc. In order to access and utilize the information in it effectively and efficiently, we need to integrate the data. Due to the large scale of Deep Web, how to improve the integration efficiency is becoming a very hot research topic in the database field.This paper takes the Deep Web data integration system as the target application. Facing the heterogeneous myriads of data in the Deep Web, we mainly focus on how to improve integration efficiency during the process of the database discovery and selection. The main works include the two aspects.Query Routing: Many recent research efforts in the direction of data integration focus on "domain-based" integration issues. In order to reduce the number of data sources, we need to find the sources in relevance to the user requirements. This paper introduces a source selection system based on attribute co-occurrence framework for ranking and selecting Web sources.Increment-Based Random Walk Sampling: Selecting appropriate Web databases to submit query also can improve the integration efficiency. An increment-based approach INC-HIDDEN-DB-SAMPLER which improved HIDDEN-DB-SAMPLER is proposed to deal with all attributes on the query interface. A set of records as the samples are obtained from the Web database.This paper first introduces a "domain-based" data integration framework, and focuses on the database discovery and selection to improve integration efficiency. This paper's subject is a broader technology in the current application areas. This is not only a research paper exploring the theory that the value of research, and is also of great value and practical significance.
Keywords/Search Tags:Deep Web, data integration, database discovery, database selection
PDF Full Text Request
Related items