Font Size: a A A

Design Of Search Engine Based On Lucene And Heritrix

Posted on:2016-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z W LiFull Text:PDF
GTID:2308330464963127Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology and the growth of network information, people found it is necessary to improve the accuracy and real-time of general search engines for the massive network information. Search engines combine the Internet technology with computer application technology. While vertical search engines can retrieve a specific field, through which users can search informations more accurate, fast, convenient and professional among the vast amounts of data.Combined with the development of existing domestic and international search engines situation, this paper discusses the principle and the design of vertical search engines based on Lucene and Heritrix. On the basis of discussion about the system structure of internet search engine, it introduces in detail the full-text search engine toolkits of open source code, Lucene, java open source web crawler, Heritrix and web server. Finally, it designs and developes the vertical search engine-book information search engine.Vertical search engine is a retrieve tool to search for a particular topic, which also called professional search engine. It mainly overcomes the shortcomings of general search engine such as massive information, low accuracy, and shallow content and so on. Its main characteristic is to extract unstructured data into structured data. Lucene has become an excellent full-text search engine through a large number of object-oriented design ideas. Heritrix achieves data capture in particular web pages using crawler software with powerful data capture capabilities, and then organizes the crawled content with database, eventually shows the book information which matches the client through the server. The use of the Lucene search engine technology and Heritrix a detailed design, implementation, using a web crawler to crawl websites and book information extraction and storage of structured and indexed database, the end user can search for a more accurate way to search the book information users need.
Keywords/Search Tags:Vertical search engine, Lucene, Heritrix, Web crawler
PDF Full Text Request
Related items