The Research And Implementation Of Enterprise Search Engine Based On Nutch

Posted on:2012-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:B Chen

Full Text:PDF

GTID:2218330362457689

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology, the information inside a modern enterprise is growing explosively. The voluminous information makes it difficult to get useful information and lowers the efficiency of employees. Therefore how to search internal information of enterprise has become a hot topic. The traditional enterprise search engine often uses the B/S architecture, because of its low scalability, when the enterprise data grows and exceeds its capacity, this architecture will meet a bottleneck of limited computing ability, storage and network bandwith.With a detailed study of open source search engine Nutch and its relevant technology, a full enterprise search engine which uses distributed processing architecture was designed. According to the features and updating law of data sources,designed three crawlers to crawl document, database and website data. In this system, the collecting, indexing, searching sub-systems all work in a distributed processing manner. The indexing module uses the MapReduce programming model to crawl data and put the analyzed data into the orginal database; the indexing module reads data from the original database and creates a index database; the search module returns the search result by searching the index database. All of the sub-systems communicate with each other by ditributed file system HDFS. Proved by test, the system has successfully completed real-time indexing of different data sources under distributed processing enviroment and achieved the intended goal.

Keywords/Search Tags:

Nutch, Enterprise Search, Distributed Processing, Distributed Crawlers

PDF Full Text Request

Related items

1	Research And Implementation Of Distributed Search Engine Based On Nutch
2	Research And Implementation Of Search Engine Based On Nutch Architecture
3	A Desing And Implementation Of Distributed Enterprise Search
4	Research And Design Of Distributed Vertical Search Engine Based On Hadoop
5	Design And Implementation Of Distributed Logistics Vertical Search Engine Based On ElasticSearch
6	The Research And Implementation Of Distributed Topic Web Crawler Based On Nutch
7	Research And Implementation Of Distributed Mongolian Search Engine System
8	Distributed Search Engine Research
9	Research And Implementation Of Template Based WEB News Searching Technology
10	Research And Optimization Of Distributed Crawler System Based On Nutch