Font Size: a A A

Design And Improvement Of Website Full-text Retrieval System Based On Lucene

Posted on:2016-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:H F LuoFull Text:PDF
GTID:2308330464967242Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Along with the development of information technology and Internet technology, the explosive growth in the volume of information follow close on succession, our humanity has been entered into the information era. While people enjoy the convenience of Internet improvement, how to find satisfying user requirement information in the massive information data quickly, has become an important key technology of the information retrieval. For the situation of continuously produce, update or disappear website on the internet, the appearance of search engine technology breaks the Internet complex situation, it provides a powerful means for the user to find their required information. As Lucene is an open source search architecture which is opened, object oriented and other advantages. By using the core of Lucene for our system, we build an independent retrieval system that is suitable for different environment.By analysis and research on the framework of Lucene technology in this thesis, we describe the retrieval method and its basic principle base on full-text retrieval system. Combine with the characteristics of diversified website information currently, we analysis the disadvantage of full-text retrieval technology based on relational database. Then we develop a set of practical, high versatility full-text retrieval system using on website, solving the problem of network users search demand. The work and achievements are mainly reflects in the following aspects:(1) Analysis the structure and principle of full-text retrieval search engine base on Lucene depth. Research the word segmentation principle of Chinese base on Lucene. We design and improvement an optimized Chinese Analyzer by combined with the Chinese semantic. The construction of synonyms lexicon engine implement the synonyms retrieval function.(2) According to the characteristics of the Lucene that can only search text type data, we propose a new text analytic method named Tika. We extract information to construct index by using Tika text paser, which is applicable to all types of documents. Avoid the complexity in using different text parser for different file format.(3)The design of website message distribution system is use to test the performance of our retrieval system. Also the message distribution we build has a perfect mechanism. By coordinate with the MySQL database structure design and retrieval system optimize, we implement the website searching.(4) The design and implementation of the expansion module using on our retrieval searching. Such as highlight search,Near-real-time search and Solr. The introduction of Near-real-time search, which makes the system file can quickly be indexing and search, reduces the operation cost when we commit the index. With the highlight search and Solr index service both improve the stability of the system and the user experience.
Keywords/Search Tags:full-text retrieval, Chinese word segmentation, text parser Tika, near-real-time search, Solr
PDF Full Text Request
Related items