Font Size: a A A

Distributed Search Engine Research

Posted on:2015-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z C SunFull Text:PDF
GTID:2298330467470261Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of global digital information, the previous centralizedstructure of the search engines can not meet the needs of the people to retrieve data, sothe distributed search engine come out and developed very quickly. The distributedsearch engine not only save the money but also accelerate search rate and provides somefunctionality scalability and redundancy.The distributed search engine consists of three main components, namely, crawler,indexer, and inquiries.Crawler:This section first describes the basic of crawler, namely HTTP protocol, andthen introduce the two components of the crawler, namely crawler cluster and storagecluster, finally propose the2-Balance Dynamic Bloom Filter data structure and it’s insertalgorithm to relieve the increase of space of the Dynamic Bloom Filter.Indexer: This section first describes the Inverted index technology, which means tosearch pages according to contents, and propose the the distributed Inverted indexalgorithm base on Map/Reduce model. For the page resulting set, according to Linkranking algorithm to give every pages of the set a source, scores first, and distribute thecalculate to the other compute to relieve the calculating complexity of eigenvectors,propose the pagerank algorithm base on Map/Reduce model.The last part, use open source tools Solr+Nutch+Hadoop to develop a small searchengine and provide a user interface for queries.
Keywords/Search Tags:Set representation and search, Bloom Filter, Hash Searching, nutch, hadoop
PDF Full Text Request
Related items