Font Size: a A A

Algorithm Design And System Implementation Of Search Engine Confederation

Posted on:2005-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2168360152467679Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With Internet information explosion, central search engines face challenges in scalability, freshness, specialized requirements and etc; distributed search engines to some degree solved scalability problem of central systems, but has limitation in precision, distributed organization and etc. In this case, highly scalable, meaningful and practical resources organization method and retrieval system is needed, and this paper designed the system architecture of such distributed resource navigation system – search engine confederation, then implemented log-based confederation prototype, which serves rapid, accurate and dynamic resource navigation. After key technologies analysis, the paper provides the design of search engine confederation system architecture. It is a central control system, with the center serves resources navigation and the nodes are site search engines which could recommend each other through the center. The design is highly scalable, which could become the standard for distributed information retrieval system.Since the basis of confederation is nodes, we firstly worked on high-quality search software for nodes. Key issues include webpage crawling, preprocessing, indexing and ranking. Novel block-based indexing optimization and webpage ranking algorithm were adopted, many engineering works done before the software deployed at five sites in CERNET, which became the experimental platform of confederation.Considering the importance of search log application in information retrieval and its characters of accurate result prediction and fast adaptability, this paper put forward the system design of log-based confederation. Key issues include system architecture, log protocol format and log-based resources ranking algorithm. This design has high scalability, practicability, and creativity in search log application.At last, detailed implementation of log-based confederation is introduced, including log protocol generation, and node information gathering, indexing and retrieving. Based on current five nodes, the prototype of confederation is set up, and the experimental data demonstrated its performance and promising application. Above all, the paper's main contribution is distributed algorithm researching, architecture design, site software implementation, and log-based confederation set up, which established solid ground for future development of confederation.
Keywords/Search Tags:search engine confederation, inverted index optimizing, webpage ranking, database ranking, log analysis
PDF Full Text Request
Related items