The Research And Implementation Of Distributed Topic Web Crawler Based On Nutch

Posted on:2019-07-25

Degree:Master

Type:Thesis

Country:China

Candidate:X Jing

Full Text:PDF

GTID:2428330548979587

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the age of information explosion,it's important for us to learn how to use search engine to accurately find useful information.Although users could find information they care about using general search engine,but the search result contains much irrelevant information too.Topic-specified web crawler is an important component of topic search engine.It is of great theoretical value and practical significance to study the topic-specified web crawler while we need to use topic search engine to improve the precision of information retrieve.Many big data processing tools such as Hadoop,Spark are developed to deal with computation tasks for big data.These tools use a computer cluster to fulfill computation tasks which are usually processed by computers with mass memory.After studying topic-specified searching technologies,open source search engine Nutch and learning automaton algorithm,this paper proposed a topic-specified and distributed web crawler based on improved learning automaton algorithm.The crawler make some modification to the Fetch and Parse module of Nutch and use many Seed URL acquisition strategy to improve precision,recall and running efficiency of topic crawling,enabling the crawler to adapt web.Finally,a set of simulation experiment was conducted to show the performance of the proposed crawler.Simulation study showed that proposed crawler performs better precision and efficiency.

Keywords/Search Tags:

search engine, topic-specified web crawler, distribution, Nutch, learning automaton

PDF Full Text Request

Related items

1	Research And Implementation Of Scientific Topic Search Engine Crawler Based On Nutch
2	The Design And Implementation Of WEB Crawler And Topic Search Engine Based On Nutch
3	The Topic Of Science And Technology Projects Search Engine Based On Nutch
4	Inquisition Of Nutch's Application On Searching Network-based Learning Resources
5	Design And Implementation For Topic Specific Meta Search Engine Based On Web Data Mining
6	Rresearch And Design Of Blog-oriented Vertical Search Engine With The It Technology As The Theme On The Basis Of Nutch
7	The Research And Implementation On Lucene-Based Topic Search Engine
8	Design And Implementation Of IT-oriented Distributed Topic Crawler
9	BT Forum Oriented Search Engine And Mobile Application Technology
10	Research And Implementation On Key Techniques Of Topic Search Engine