Font Size: a A A

The Research And Application Of Full-text Retrieval Technology Based On Lucene

Posted on:2016-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z F WangFull Text:PDF
GTID:2298330452466425Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet information technology, electronic equipment load moreand more kinds of data. After2012, we have quietly went into the era of big data. As the amount of datawithin the companies growing, more and more companies began to build their own informationplatform and sharing sites, then how to locate the required information quickly and how to find keyinformation from a large number of text becoming a problem. At this moment, we have to use full-textsearch technology. How to use the full-text search technology to build a distributed enterpriseinformation systems and provide efficient retrieval services become a difficult problem that more andmore enterprises are facing.First,The use of the traditional database retrieval system will seriously affect the efficiency ofsearch and query, it is almost impossible if you want to use the database retrieval system just like thesearch engines such as baidu and Google. Second, the relational database retrieval can not deal withunstructured data effectively. Third, many companies want the system to be scalable and maintainable.This article discusses the full-text search technology and distributed technology, and combines them todevelop a high-performance scalable distributed full-text retrieval system,this system can effectivelydeal with unstructured data, significantly improving search efficiency and it is convenient to extend andeasy to maintain. The specific content of this paper include:1) First we introduced the full-text search technology and distributed technology, includingLucene’s structure and its work processes, Inverted index mechanism, sorting algorithms, segmentationtechnologies and distributed computing and distributed clusters.2) Then we analyzes the structure and principles of the inverted index and do some improvement;make some recommendations to optimize the retrieval process; study Lucene’s sorting algorithms andsegmentation techniques and do some improvement on existing word matching algorithm to support themaximum matching problem.3) Next this paper design a full text retrieval system and proposes a simple and efficient distributedfull-text retrieval system model. Including the requirements of analysis,the design of the process ofbuilding index, the design of retrieval module and dictionary module. After analyze the relationshipbetween Solr and Lucene and because of the advantages and features of Solr, this paper decided to useSolr to build and develop a distributed full-text search server, this model can be extended easilymaintained.4) Finally we applied this model to the enterprise actual project. In this process,we complete theprocess of index creation and do some optimization,implements paging query with Lucene and explainthe steps to build a full-text search server with Solr in detail. Compared the test results before and afterthe use of Solr, results significantly show that the system run good performance.What is more,this system is scalable and easy to maintain,it can retrieve all kinds of unstructured data effectively. In aword,the system absolutely meet the requirements of enterprise internal full text retrieval.
Keywords/Search Tags:search engine, Lucene, Solr, Full-Text Retrieval
PDF Full Text Request
Related items