Font Size: a A A

Rearch And Implementation Of Vertical Search Engine Based On Lucene

Posted on:2017-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:B HuFull Text:PDF
GTID:2348330503992920Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Vertical search engine is a kind of information retrieval tool,its index data tends to be structured,and the retrieval range tends to be industry.Compared with general search engines,the retrieve results of vertical search engine are more precise and accurate.The research focuses on vertical search engines based on the information retrieval tool "Lucene".Based on the Lucene basic sorting algorithm and popular retrieval models,we propose a sorting algorithm of fusion position feature and probability ranking.Based on the basic principle and structure of vertical search engines,small car vertical search engine is designed and implemented,and the new sorting algorithm is used.The main work includes four aspects as follows:Firstly,a query weight algorithm based on Vector Space Model is proposed to reflect the word weight.By using query's location and frequency in the document,TF-IDF calculation method in Vector Space Model is improved to obtain query weight of related location.Secondly,an improved algorithm of fusion position feature and probability ranking is proposed through in-depth analysis of Lucene basic sorting algorithm.First of all,the query weight of relevant location is fused into the rating formula of Lucene sorting algorithm by taking into account the impact of the query's location in the document for document relevance.Then,the probability value of document relevance based on naive Bayesian classification algorithm is also fused into the formula.Thirdly,a small car vertical search engine based on Lucene is constructed,including the collection of automotive products,the parse of Web documents,the extraction of structured information,the building of index and the retrieval of related documents.The relevant documents are sorted by Lucene sorting algorithm of fusion position feature and probability rankingFinally,comparative experiment is designed to compares the performance and effect between Lucene sorting algorithm and improved algorithm.The experiment shows that the precision of retrieval system used improved algorithm is dramatically increased,the recall and the F value are improved and more stable.The problems of query's location and theoretical support in the original algorithm are solved and the precision is increased by improved sorting algorithm.The new sorting algorithm is independent and reusable,which can provide the sort support for vertical search engine at different themes.The car vertical search engine system has a simple structure and function interface to provide a convenient for updating and improvement.
Keywords/Search Tags:Vertical search engine, Lucene, Sorting algorithm, Location-related, Probability ranking
PDF Full Text Request
Related items