Font Size: a A A

Personalized Search Engine System Based On Lucene

Posted on:2014-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z L MiaoFull Text:PDF
GTID:2248330392462903Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has brought the knowledgeexplosion,so users have to rely on the help of search tools to find the informationneeded from huge amounts of data. Demand determines the market,so varioussearch engines emerged,such as Baidu,Google.Traditional retrieval techniques in both theory and practice are already quitemature.Open source communities have brought us a few third-party API librarieslike Xapian, Lucene and complete search solutions which are based onthird-party API. This paper analyses the principle of the search engine,composition, workflow and the related theories of each module thoroughly andmeticulously,and next focuses on the famous API class library Lucene,whichincludes module structure、file format、indexing process、result sorting.Currently, mainstream search solution does not support javascript script andmade a compromise on the number and speed of Web crawling. In recent yearsthe birth of fast javascript interpreter engine offers the possibility to solve thisproblem.This paper imports javascript interpreter engine to improve theunderstanding of script,learns the principle of overloading operators formC++,overloads the operators from script which are related to URL into setoperations to extract URLs,and then does a comparison test in both Intranet andInternet,sums up the reasons for the failure in Internet and the success in Internet.Link analysis is an important parameter to measure the quality of web,so thispaper imports PageRank algorithm into Lucene original score formula for pageranking to improve scoring accuracy and the quality of search results.Lucenehas an excellent design which exposes various interfaces in each functionalmodules to meet the demand for customization.Base on these interfaces,thepaper takes an experimental comparison with the original score formula.Finally the paper do some practical explorations in search enginepersonalization. Traditional search engines are generally based on keywordmatching, did not make full use of the user’s personalized information, lack ofpersonalization features.The paper introduces theories about how to collect userinformation and how to build and use user interest module.Furthermore underthe guidance,considering the actual needs,I designed and developed a simplepersonalized search module.The simulation result proves that the module is effective.
Keywords/Search Tags:Search engine, V8engine, Lucene, PageRank, Personalization
PDF Full Text Request
Related items