Font Size: a A A

The Design And Implementation Of The Heterogeneous Data Joint Retrieval System

Posted on:2014-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:W GaoFull Text:PDF
GTID:2308330473953763Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the popularization of computer and network, more and more enterprises, agencies and schools adopt computer to manage documents. In managed processes, a large number of electronic documents can be created. It brings up an important problem that is how to retrieve useful information from tremendous amount of information-resource effectively and accurately. An enterprises exists the problem of document retrieval as well, the enterprise is using the directory management to the document at present, and none for all document retrieval system. The staff needs to spend a lot of time to find out some information. The enterprise needs a search engine to search all the documents and meet the requirements of different users.This project bases on requirements of the enterprises, studies on the design and implementation of index and search mechanism in the Design and Implementation of the Heterogeneous Data United Retrieval System. The system provides many search approaches, such as "search by document type", "search by publisher", "search by published date" and so on, to make the search engine user-friendly. At the same time, for the characteristics of enterprise in huge amount of data and requires accurate results, the system have done a lot of optimization on the establishment of index and the retrieval process and Paoding Chinese analyzer.The system is developed with Java, and mainly adopts Java-based full-text index toolkit Lucene to implement. Taking into account the huge amount of data and the system upgrades in the future, the database adopt GreenPlum as the database server which is specifically for large-capacity data processing. SSH framework is used in this system, POI and PDFBox toolkit is used for parsing the documents, Paoding Analyzer is used as Chinese analyzer. After the implementation, system runs well, and the functionality meets the requirements well.As for the retrieval results, the system accomplish primal design triget on the whole.
Keywords/Search Tags:full-text retrieval, Lucene, index, Chinese analyzer, GreenPlum
PDF Full Text Request
Related items