Font Size: a A A

The Research And Application Of Desktop Search Engine Based On Lucene

Posted on:2013-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:J P HuangFull Text:PDF
GTID:2268330398995303Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the computer as well as the increase of disk capacity, thetype and number of files stored on the hard drive are more and more day by day, how to find therequired documentation from local files quickly and accurately has become strong demand inthe computer for personal using. Google, Baidu, Yahoo and other search engines continued todevelop their desktop search engines to this demand, but unable to open source on these searchengines for their own commercial reasons. So it’s difficult to meet the needs of the user’spersonality and diverse practical application, and isn’t also well integrated into the user’ssoftware framework, there are still hidden dangers of user privacy and security aspects in thesesoftware. Therefore, based on existing open-source search framework, it is more important todesign and develop desktop search engines to meet the actual needs in practical applications.This thesis researches the desktop search engines combined with the open source searchframework of Lucene, firstly introduced the concept of the desktop search engine and systemarchitecture, and then focuses on the theories and technologies in-depth analysis of data crawland extract function in the file path with the contents of the file resolution. Then study thefull-text indexing technology on constructing method, the index inverted file performance andindex compression. Analysis the chief information retrieval model by researching on extendedboolean retrieval and vector space model. Also introduced the characteristics of the Lucenesearch framework and system architecture, especially researched on the built-in parser and theChinese analyzer of lucene deeply, and compared the effect of each word analyzer. Focus theprocess on Lucene index, the index operation, index optimization and index lock mechanismstudy. Search and retrieve extended analysis of the core classes and scoring mechanism of theLucene retrieval and Lucene, Lucene search technology.Combination of basic theory and related technologies in the integrated developmentenvironment, the file path to grab the module, the file dynamic monitoring module, thedocuments analysis module, and indexed and retrieval module. In accordance with the simple,friendly conceptual design of the user interface of the desktop search interface, and various functional modules and projects have been deployed on the Tomcat server. Through therealization of the system to be able to use the desktop search engine indexing and retrieval ofcommonly used documents in the PC developed in this paper. The same time, the characteristicsof the PC file changes, combined with the actual index updates need to put forward and use thefile dynamic monitoring technology to monitor the file change and file creation, modify anddelete the action characteristics, in a timely manner add documents in the index, update thedocument and delete document operation in order to achieve real-time updates of the indexbased on document changes. Desktop search engine developed by the actual test with a fast,index and retrieve high accuracy, real-time. To a certain extent, it compensates for thedeficiencies of the existing desktop search and has good practical value.
Keywords/Search Tags:Desktop search engine, Lucene, Document parsing, Dynamic monitoring, Index and retrieve
PDF Full Text Request
Related items