Font Size: a A A

Studies And Examples Of Search Engine Based On Lucene And Heririx Build

Posted on:2009-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiuFull Text:PDF
GTID:2208360245461126Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid growth of information on Web, more and more focus is paid to how to retrieve potential and useful information from gigantic amount and effectively play it in management and decision.A Web Search Engine is a kind of special web page available for Internet information retrieving. It collects various web pages through robots called Crawler, and stores the information into databases after the original web pages being analyzed. When the web surfer inputs keywords he wants to know, the Web Search Engine searches the indexes in its database and fetches relative web pages for the user.From 1994 on, Web Search Engine has evolved through three stages: Centralized Search, Distributed Search, and Intelligent Search. Nowadays, it is mainly focused on automation search, smart classification, and intelligent analysis. In the future, the research area will expand to such extent as multimedia search, specialized search, and interlanguage search to fulfill the Web surfers' various requirements.This thesis mainly introduces the Lucene technology which is broadly used in the software industry to exploit Search Engine, analyzes its structure and mainly working theory. In the following content of this thesis, introdueces all the core components of Web Crawler Heritrix. At last, customizes some components of Heritrix based on the requirement of the Search Engine demo, then, accomplishes the demo. At the last chapter of this thesis, deeply analyzes the senior search technology based on the source code analysis, give the scheme of search optimization and performance improvement.At the appendix, analyzes the Analyzer of Lucene. Then accomplishes a Chinese Analyzer, addes it into the demo, enhances the accuracy and range of the search result.
Keywords/Search Tags:Search Engine, Web Crawler, Lucene, Heritrix, Performance improvement
PDF Full Text Request
Related items