Font Size: a A A

Multi-document Retrieval System Design And Development

Posted on:2011-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:H S WangFull Text:PDF
GTID:2208330332477307Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the computer technology and network technology, the rapid development of a variety of formats rapid expansion of the number of electronic documents, which mainly Among them, word document, in which vast amounts of documentation on how quickly and effectively find the information they need to become a reality. Full-text retrieval system is to meet these needs of people born out. Full text search is a modern information retrieval technology, an important branch, it greatly improves the data from a large number of complex and complicated to find specific information more efficient. The main task of this study to design a word document format for the full text of the multi-document retrieval tools to achieve the specified directory or file directory traversal and retrieval of complete multi-document text retrieval system design and development, to provide users with a fast, secure channel for information retrieval.This thesis related to the Chinese full-text search technology was more in-depth research. Word-based full-text index table, CLucene uses a inverted index structure, to achieve sub-block index, can be established for the new file small file index, and then merged with the original index, and then optimize purposes. Compared with the traditional index structure, easier to build the index, update, maintenance, and can effectively improve the indexing speed.Since the handling of the current CLucene limited to plain text data objects, so this use of VBA automation technology and OFFICE related technologies, to achieve a word document to the text extraction tool to word document into TXT format text documents, and then through the CLucene indexing mechanism for these massive documents the implementation of an index. In addition, an index based on CLucene combines Chinese word parser can quickly build full-text search for the document library has very good scalability, so it is not only used in the search engine system, but also widely used in the current professional literature retrieval system.Finally, the thesis on the system design and implementation of the key points to explore, based on the Chinese word segmentation precision and recall rates, search results returned, the query interface to prevent errors in English and Chinese retrieval processing when used in conjunction and other issues are discussed and put forward their views, hoping to have some help to the reader.
Keywords/Search Tags:full-text retrieval, CLucene, word, index, text extraction
PDF Full Text Request
Related items