Font Size: a A A

Research And Implementation Of Search Engine Based On Nutch Architecture

Posted on:2012-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:J R YuFull Text:PDF
GTID:2178330335460029Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of Web 2.0 has come, search engine on the Internet are playing a more and more important role. And the requirements of the increasing and mature internet users to the search engine are increasing high, its function with the constantly enrich and perfect the midst. Nutch search engine system is an excellent open-source search-engine, this paper is based on the Nutch project, it probes into the search engine's distributed realization.Search engines are usually made of data-fetching module, index module, retrieval module. In this paper, we do a detailed study on the general framework, search engine principle, components, work-flow. Through the deep research of each component of Nutch system, we put forward a distributed search engine platform specific design scheme. Combining Hadoop distributed computing platform, we make the nutch-search-engine to be distributed, makes the data fetching process, can be completed parallel efficient; Meanwhile, the data indexing is parallel carried out too. In a retrieval, the Web server providing unified retrieval entrance, through the IPC mechanism user's retrieval request are to be send to each child nodes in the cluster, each child nodes retrieval their local index data respectively, finally, back to the Web server, then end this retrieval. In addition, combined with the Chinese search engine users of the actual demand, we make improvements to the default Chinese-words-cutting module with the Paoding component. Through adding Paoding, it really improves the search-engines's search effect, in a certain extent. Through constructing the open source distributed cluster monitor project Ganglia, real-time monitoring Nutch cluster working condition for timely. As a result, cluster makes the adjustment and be a good assistant to the nutch cluster work.
Keywords/Search Tags:Nutch, Search-Engine, Distributed-Computing, Hadoop, Paoding-Module
PDF Full Text Request
Related items