Font Size: a A A

Research And Implementation Of Scientific Topic Search Engine Crawler Based On Nutch

Posted on:2012-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y RenFull Text:PDF
GTID:2178330338992282Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increasing demand for personalized, general search engine can not meet users the needs of all aspects. Topical search engine has the advantage of high efficiency, specialization, target, high accuracy, timeliness, personalization, it can be recognized by more and more users. Therefore, topical search engine becomes a hot research.In addition, Nutch has highly transparent, any unit or individual can view the search engine work, and the program configuration flexibility, Users can customize according to their needs. Through a long period of practical application, the results show that Nutch runs very stable. Therefore, selecting the Nutch search engine can provide a good researching platform for the loving people.The main content of this paper is Research and Realization of Scientific Topic Search Engine Crawler Based on Nutch. Technology was chosen as it can not only enrich people's spiritual life, update people's thinking but also superstition is considerable significance. In this context, I design and implementation a technology topical search engine system.In this paper, author choses Nutch as a development platform, gives a secondary development of Nutch, designs and implements a technology-themed topical search engine based on Nutch, and provides a detailed description of systems development process and methods about technology topical search engine. On the basis of reading a lot of references, after understanding the topical search engine work, topical crawler's crawl strategy, analysis and comparison of the various search strategy, view of the advantages of genetic algorithm in the search engine, we applied the idea of genetic algorithm to the crawler strategy. In the design of this system, there are crawling module, preprocessing module and query service module. In addition, Nutch only provides search platforms, it has high scalability and plug-in mechanism, but the segmentation is very poor. In this paper, we use Paodingjieniu segmentation in Nutch to improve the effect of Chinese word.Finally, we conducted an experimental tests on the system by means of deployed on the Tomcat server, to verify the feasibility of this method, and it is consistent with expected result, the results show it improves the accuracy of search results.
Keywords/Search Tags:Genetic algorithms, topical search engine, Nutch, Crawler, topic relevance
PDF Full Text Request
Related items