Font Size: a A A

Automatic discovery and selection of text resources on the Web, towards building a very large-scale and effective metasearch engine, Webscales

Posted on:2003-02-16Degree:Ph.DType:Dissertation
University:State University of New York at BinghamtonCandidate:Wu, ZonghuanFull Text:PDF
GTID:1468390011489682Subject:Computer Science
Abstract/Summary:
A metasearch engine is a system that supports unified access to multiple component search engines. We are exploring a complete set of technologies to enable building a very large-scale metasearch engine that can access up to hundreds of thousands component search engines.; One major challenge is to identify search engines, collect and maintain representative information from them. The problem is to find search engines from the Web, build wrappers for search engines to enable automatic sending of queries and extraction of feature information of search engines in a highly effective manner, because of the huge number of search engines involved. A set of corresponding techniques is developed and designed to achieve accurate search engine wrapping. These techniques are highly automatic, efficient and accurate.; Database selection is another major challenge in building a large-scale metasearch engine. The problem is to efficiently and accurately determine a small number of potentially useful component search engines to invoke for each user query. In order to enable accurate selection, metadata that reflect the contents of each search engine need to be collected and used. In this dissertation, a highly scalable and accurate database selection method is proposed. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computation efficiency and storage efficiency. Second, our selection method is based on a theory for ranking search engines optimally. Experimental results indicate that this new method is very effective.; Furthermore, using techniques described in this dissertation also makes the construction of metasearch engines easy because of their high degree of automation. A metasearch system can be built on top of a set of search engines easily with a small amount of human involvement with the developed techniques.
Keywords/Search Tags:Search, Selection, Effective, Building, Large-scale, Automatic, Techniques
Related items