Font Size: a A A

Based On The Lucene Search Engine

Posted on:2011-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2208360302992365Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Lucene+Heritrix based database search engine is an approach which brings new concept of catch and index technical superiority of databases into the search engine.Through query interface users can input key words which will be anaylised by anaysis dictionary, the search engine will search the index document,which is related to document, and returns the result.It has good capability in terms of server security, outdated links that are not updated in time etc.This paper anaylises the skills of Lucene+Heritrix -based database search engines on working principle, the key technology areas .Introduces the index and search principle on lucene, and when building a database thesaurus, studys word analysis technology, makes an improvement on ordinary analysis technology and develops Chinese word analysis module on my own based on Lucene ,stores the dictionary into the RAM in terms of decision tree and linked list,cuts the entering strings into words by using biggest matching algorithm.Turns it into fact, compares it with traditional Analyzer in space and time and accuration by experiments.Then the paper studys the similarity calculation which brought by Lucene, adds vector weight in to it,which makes it more accurately.At last, the paper realizes Lucene database search engine.
Keywords/Search Tags:Search Engine, Lucene, Chinese Words Analysis
PDF Full Text Request
Related items