Font Size: a A A

Content Indexing Search Engine Based On Lucene LAN Implementation

Posted on:2013-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:G QuFull Text:PDF
GTID:2248330395491045Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The network is now deep into people’s lives all aspects ofnetwork resources is called the mass is very rich, the ensuing question is howto effectively search the information they need. Is indeed a difficult thing topinpoint really want such a huge source of information. The most effectivesolution is through a search engine to find the required data, it can help usersquickly locate what you want to query resources. The public search engineonly for the Internet, such as the google, Baidu, Bing, if you want to searchdata sources such as intranet-like words, it appears inconvenient orimpossible to achieve. This article is designed for the class needs a searchengine can be extended to businesses, schools internal network unstructureddocument content retrieval.This paper first introduces the key technologies in the design, Lucene,Ajax, server push, demand and technology works. Analytical framework todesign a search engine, and then on this basis, there are three main modules:the search robot module, index module and search module, and low degree ofcoupling between the description of the diagram between the three modules,the module. Three modules for analysis, design and implementation. First,the search robot module is a network library and index library, efficient,flexible data acquisition, will pave the creation of a database; second, theindex module is the basis for efficient data retrieval, file content index andthe index data storage structure will diraectly affect the search speed, andwill affect the user experience. Reasonable design is very important, on thisbasis, the system uses the Lucene inverted sort index, its efficiency is higherthan the traditional index; Finally, the search module to retrieve user data,reducing server data paging and client data transfer, and Google SuggestAjax effects to improve the user experience.Finally, the article summarizes the design and implementation of theentire system, as well as future expansion, mainly how to use the distributedcrawling, indexing and retrieval, as well as how to use the memory databaseto establish a more efficient URL library and distributed Hadoop library of MapRedue thread pool, in order to improve the efficiency of search robotscrawl.
Keywords/Search Tags:Index, Lucene, J2EE, Search, User-Experience
PDF Full Text Request
Related items