Font Size: a A A

The Research And Implementation On Lucene-Based Topic Search Engine

Posted on:2011-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:D P WangFull Text:PDF
GTID:2178360305952211Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the high-speed development of Internet, Web shows the amount of information stored on several geometric growths. However, the amount of information on a broad array of Internet to find needed information is a challenging task. The emergence of general search engine makes its needs in this area has been a certain solution, but because of its versatility, limiting people's information resources accurate and effective access to it. Therefore, in order to obtain valuable information on the Internet, you must have a cost-effective solution. The emergence of specialized search engines because of its professional, can effectively avoid "interference with information". As a result, it can greatly improve the accuracy of search results and it has become a research hotspot in this field.This thesis is on the basis of the open source Lucene full-text retrieval toolkit environment. It is on the bases of the employment job search engines, mainly related to the following areas: Web data collection, Web indexing, Web search results sorting, in which web pages Results sorting algorithm is designed to be the core content of this thesis.This thesis starts with the basic theory of search engines and it begins to explain the theory and the search engine Lucene indexing algorithm, various sorting methods Lucene theoretical, application conditions and the scope and the advantages and disadvantages compared analysis; then it describes in detail the process of Lucene indexing documents in the domain (Field) weighted by the relevant theory. Based on the employment information and data of Hebei SouCai Website, referring to the characteristics fo employment information resources, this paper constructs a model and optimizes the sorting of search engine. Theoretical research and practical tests have shown that the Lucene document fields used in the weighted (Boost) algorithms, through the stages in the index weighting on specific domains, thus quickly and accurately to be satisfied with the search results output, is a more scientific and practical results of sorting algorithm. In the end, some drawbacks and the futhre works have also been present in the end.
Keywords/Search Tags:Search Engine, Topic Search Engine, Lucene, Nutch, Sorting algorithm
PDF Full Text Request
Related items