Distributed Search Engine Research

Posted on:2015-07-17

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Sun

Full Text:PDF

GTID:2298330467470261

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of global digital information, the previous centralizedstructure of the search engines can not meet the needs of the people to retrieve data, sothe distributed search engine come out and developed very quickly. The distributedsearch engine not only save the money but also accelerate search rate and provides somefunctionality scalability and redundancy.The distributed search engine consists of three main components, namely, crawler,indexer, and inquiries.Crawler：This section first describes the basic of crawler, namely HTTP protocol, andthen introduce the two components of the crawler, namely crawler cluster and storagecluster, finally propose the2-Balance Dynamic Bloom Filter data structure and it’s insertalgorithm to relieve the increase of space of the Dynamic Bloom Filter.Indexer: This section first describes the Inverted index technology, which means tosearch pages according to contents, and propose the the distributed Inverted indexalgorithm base on Map/Reduce model. For the page resulting set, according to Linkranking algorithm to give every pages of the set a source, scores first, and distribute thecalculate to the other compute to relieve the calculating complexity of eigenvectors,propose the pagerank algorithm base on Map/Reduce model.The last part, use open source tools Solr+Nutch+Hadoop to develop a small searchengine and provide a user interface for queries.

Keywords/Search Tags:

Set representation and search, Bloom Filter, Hash Searching, nutch, hadoop

PDF Full Text Request

Related items

1	The Research On The Muti-keywords Search Technology Over P2P Network Based On Bloom Filter
2	Research And Implementation Of Search Engine Based On Nutch Architecture
3	Research On A Memory-Efficient Bloom Filter For DPI
4	Study And Application Of L-Priority Bloom Filter
5	Inquisition Of Nutch's Application On Searching Network-based Learning Resources
6	Researches And Applications On Efficient Bloom Filter For Big Data
7	Invertible Bloom Filter And Its Application Of Identifying Elephant Flows
8	The Simulation And Comparison Of Bloom Filter And Its Improved Algorithms In The Distributed Environment
9	Study And Implementation On Chinese Word Segmentation Algorithm Of Search Engine Based On Nutch
10	Research And Design Of Distributed Vertical Search Engine Based On Hadoop