Font Size: a A A

Research And Improvement Of Lucene-based Search Engine

Posted on:2007-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:H M WuFull Text:PDF
GTID:2178360212972179Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Search engine is one of the main tools people use to obtain information from the web. Full-text retrieval technology has become the hot topic of academic research in the area of search engine. Lucene is a full-text index/retrival software package widely used recently in both industry and academia. It can be conveniently embedded into softwares for specific full-text retrieval purpose.Robot is the key component of a search engine for collecting resources. Its capability has a direct impact on the quantity and quality of information retrieval. However, robot is not provided in Lucene. Based on research on related technologies of both search engine and Lucene, this paper presents a muti-threaded Robot, whose number of threads can be set on instantiation. It can be considered as a extension of Lucene.In addition, this paper improves the ranking algorithm used in Lucene. Ranking is very important for a search engine because most users only feel interested in the first few documents appear among the results. Therefore, ranking search results plays a vital role in the success of a search engine. Besides the factors already considered in Lucene, the enhanced ranking algorithm also consider other factors such as links, text length and the special positions where the querying words stand. Experimental results show that this improved ranking algorithm work outperforms the original one.
Keywords/Search Tags:search engine, full-text retrieval, ranking algorithm, Lucene, Robot
PDF Full Text Request
Related items