Font Size: a A A

Research And Implementation Of Distributed Search Engine Based On Nutch

Posted on:2016-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:K ZouFull Text:PDF
GTID:2298330479450163Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, people have become increasingly dependent on the Internet to get information. Search engine set up a bridge between the people and the massive network information. However, with the rapid increase of Internet users and network information increases exponentially, the network traffic will increase, the traditional centralized search engine encountered a bottleneck. At present, because of its powerful ability of data processing, the distributed computing technology ease the contradiction. In this paper, a simple distributed search engine system is realized, base on Nutch, which is a excellent open source distributed web crawler, and Elasticsearch which is a excellent distributed full-text search server. This paper first introduces the fundamental principles of the search engine and the system architecture of a search engine, then introduces the related open source technology about the distributed search engine we will realize,such as Nutch technology, Lucene technology, Elasticsearch technology, and Apache Hadoop. On the basis of these technologies, a distributed search engine system integrated with Nutch and Elasticsearch is proposed. In this system, Nutch is mainly responsible for collect Webpage data over the Internet, Elasticsearch works as a full text retrieval server,and index the data collected by Nutch, and provide search service. While developing the search engine sysem, the IKanalyzer is import into the system to enhance support of Chinese. And a Web program is implemented to search with Elasticsearch. Finally, through the experimental test, it is proved that the system can achieve Webpage quickly,And provide good search service, and has good support for Chinese search.
Keywords/Search Tags:Nutch, Elasticsearch, Distributed, Search Engine
PDF Full Text Request
Related items