Font Size: a A A

The Research And Implementation Of Full-Text System Based On Lucene And Textual Image

Posted on:2013-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:L XuFull Text:PDF
GTID:2248330371467101Subject:Information security
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet and the explosive growth of digital information, people can stay at home to have access to vast amounts of information. Users need to be found in the vast data they need information, not just text documents. Audio files, image files, video files and even become more frequent users of the query object with the development of multimedia technology. One paperless office with electronic library and the rise of traditional media, mostly books and paper books or documents, so the text image processing and retrieval of information retrieval systems has become a major challenge. Full-text retrieval system is a kind of index to write and read-intensive applications, the traditional full-text retrieval system to retrieve a single object, cannot meet the user needs to retrieve a variety of objects, but also in the inverted index construction and query stage also optimization of space.The full text of this article with text image retrieval model for the study, and the inverted index construction and query optimization model and the text image preprocessing and classification research, design and implement a full text-based image retrieval system. Specific tasks are as follows:(1) This article describes several common full-text indexing model, and described in detail in the third chapter focuses on the index inverted index build optimization models and the retrieval process of optimization, reducing storage space and provide run-time optimization of retrieval performance improvement program.(2) The classification of text images in this paper, OCR processed text image text incomplete recovery after pre-classification and characteristics, carried out by the SVM text classification, compared to traditional classification methods, increasing the recovery characteristics and feedback process to improve the classification of the F1 measure.(3) Based on the Lucene full-text retrieval system design and implementation.In this paper, the improvement and research methods, based on the design and implementation of a full-text document image retrieval system, experimental results show that the classification of text images and user queries can be pre-weighted users prefer search results.
Keywords/Search Tags:Lucene, Full-Text Retrieval, Inverted Index, Text categorization
PDF Full Text Request
Related items