Font Size: a A A

Design And Implementation Of Heterogeneous Document Library Full-text Retrieval System

Posted on:2017-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:D PanFull Text:PDF
GTID:2348330503472467Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the information era, the Internet information is not the only at the speed of exponential growth, enterprise internal documentation of long term accumulation, also more and more. People faced with a problem of how to quickly find the information they need in the mass data. Document retrieval system based on Solr, through the collection of the document metadata information and parse the text content, to build internal document indexing structure for the Heterogeneous document library, to provide users with the tools to quickly retrieve the desired document.Adopts B/S structure of full text retrieval system, Server according to the functions are divided into file capture module, document parsing module, word segmentation module, index management module and information retrieval module. File capture module uses the incremental updating mode, regularly obtain the modified documents on the Heterogeneous document library, using JCIFS for the directory document library, using SVNKIT for the SVN and Polarion document library, using database connection for the document database. Document parsing module parse various types of documents to obtain its text content, for office documents using POI parsing, for PDF documents using PDFBOX parsing, for XML documents using JDOM parsing, for Polarion documents using custom mode parsing. Word segmentation module to integrate the open source IKAnalyzer Chinese word segmentation tool into Solr, provide Chinese text segmentation function. Index management module integrates the content and metadata information of the document, updates to the index library, in which the index library for using Solr inverted index structure. Information retrieval module uses server-side service interface, to provide users with search interface and provides the system Settings, file upload function interface.Full-text retrieval system for enterprises to build a search engine based on full-text retrieval, provides enterprise customers with convenient and quick document search function. System test shows that the service side index updates fast, the response speed of document search, recall rate and precision rate are up to the user's requirements, Basically implement the system function.
Keywords/Search Tags:Incremental update, Web spider, Inverted index, Full-text search
PDF Full Text Request
Related items