Font Size: a A A

Research And Application On Document Retrieval System Based On Lucene

Posted on:2013-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y L GaoFull Text:PDF
GTID:2248330407961510Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Full-text retrieval is a very important branch of the modern information retrieval technology. It is a powerful tool to process unstructured data, and is one of key techniques of modern search. This pager deeply does research on full-text retrieval technology. In the web page ranking algorithm, this paper put forward an improved PageRank algorithm. Compared with traditional PageRank algorithms, it improves the traditional algorithm in topic drift and weights of deposition. This paper focus on full-text retrieval technology application, taking use of new technologies, improving retrieval performance, and accelerating the speed of retrieval.PageRank algorithm is a page ranking algorithm based on webpage link and submitted by Google. The traditional PageRank has two deficiencies that are topic drift and weights of deposition. The paper is on the basis of deep research on traditional PageRank algorithm and proposes a kind of quadratic weighted and improved PageRank algorithm, which effectively improve the topic drift and weights of deposition.At present, full-text retrieval platform is not very common. This paper introduces a kind of full-text retrieval tool kit-Lucene whose features are powerful and written entirely in Java language, which is easily embedded into various applications and widely used in recent years. Lucene is also a completely open-source software package, so it provides us a very good opportunity to study the key technologies of the search engine. At the same time, it is meaningful to study its source code and conduct the secondary development.In the application, this paper designs and implements a document search system based on a policy of Lucene service outsourcing. This system is a web application based on B/S mode, which adopts the mainstream design pattern of MVC, Struts for software architecture, and Java for the development language. The system includes four modules, such as the document entry module, the index establishment module, query module and results processing module. In the results processing module, the system improves page sorting algorithm of Lucene through quadratic weighted, and obtains good results through a lot of experiments.
Keywords/Search Tags:Full-text retrieval, Page ranking, Weights of deposition, PageRank, Lucene
PDF Full Text Request
Related items