Study On Data Sources Discovery And Selection On Deep Web

Posted on:2009-12-31

Degree:Master

Type:Thesis

Country:China

Candidate:M F Li

Full Text:PDF

GTID:2178360308478306

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the increasingly development of Internet, the amount of data sources on Deep Web is rapidly growing. However, these data sources can only be acquired by dynamic query responses. Hardly can they be indexed and searched by traditional search engine such as Google and Baidu, and thus they are not fully utilized. Therefore, exploring and study on Deep Web query search engine to satisfy the wide demands of users have become the primary focus of information research. However, for the features of Deep Web, it is very difficult to achieve data sources integration from technical perspective.To discover and integrate these Deep Web data sources, we first analyzed the state of art on Deep Web, proposed data integration framework on Deep Web, analyzed four main mechanisms, respectively repository constructing mechanism, query processing mechanism, query transforming mechanism and result integration mechanism, and described the difficulties on Deep Web integration. Secondly, we described the Deep Web crawler architecture, after analyzing the interface styles and form processing mechanisms, it adopted four-level data source discovery model and presents a domain based form crawler architecture DeepRunner and algorithm DOER for acquiring data sources within one domain. Thirdly, we elaborated on the attribute distribution of Deep Web and proposed an attribute based dominant pattern growth algorithm for top-k data sources selection, and further improves by combining the co-occurrence of attributes, which further improved the precision and recall. Finally, a query translation and result integration mechanism was described.Experiment results have demonstrated the feasibility of DeepRunner for acquiring Deep Web data sources within one domain. Various experiments on large amount of data have shown the advantages of the domain based Deep Web discovery algorithm DOER and have also validated the effectiveness of the attribute based dominant pattern growth algorithm and the co-occurrence combined approach. These two algorithms are much better than traditional top-k data sources selection strategy especially under large scale data sources integration.

Keywords/Search Tags:

Deep Web, domain, data sources discovery, data sources selection, Top-k, attribute based dominant pattern growth algorithm, co-occurrence

PDF Full Text Request

Related items

1	Research On Key Technologies Of Ontology-Based Deep Web Information Integration
2	Research On Key Technologies Of Deep Web Information Integration
3	One Improved Description Method Of Data Sources In The Deep Web
4	Research On Distributed Sources Direction-of-Arrival Estimation
5	Integrating Deep Web data sources
6	Research And Implementation Of Subject-Oriented Structured Data Integration On Multiple Web Sources
7	Data Preprocessing And Pattern Mining In Multiple Data Sources
8	Design And Realization Of SyncML-based Synchronization Method For Heterogeneous Data Sources In Mobile Computing Environment
9	The Research Of Information Evaluation Based On Sources Dependence
10	Research On The Deep Web Data Sources Classification