Font Size: a A A

The Design And Implementation Of Second-Full-Text Retrieval System Based On Lucene

Posted on:2010-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:D W WuFull Text:PDF
GTID:2178330332988393Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the social improvement in the level of informationization, It is hoped that the demand for retrieving information rapidly and accurately becomes more strong.This paper designs and implements a full-text retrieval system which supports to retrieve in multi-format documents,Through introducing other open source tools:PDFBox API, POI and modifying the core index module of the Lucene.On the basis of the original index of html, txt documents by Lucene API,this paper adds the index of doc, xls, pdf etc.Thus it implements the demand of retrieving in multi-format documents.In order to locate the retrieval keywords more accurately,this paper designs and implements a new second-index algorithm.The second index contains the information about the keywords' page-number, coordinates,context and so on.Which can be made used of locating the retrieval keywords in specific pages in result books and marking its'coordinates. Thus, the effect of the second retrieval in PDF document is as similar as Google Book.Test results show that the first and second retrieval in the system are both at a higher recall rate and precision,two retrievals'response time is within milliseconds or less. Each performance index in the system can meet the demand for full-text search applications, so it has a greater application prospect and the value of commercial promotion.
Keywords/Search Tags:Full-Text Retrieval, Second Index, Second Retrieval, Lucene
PDF Full Text Request
Related items