Font Size: a A A

Research And Implementation Of A Personalized Search Engine Based On The Mobile Information

Posted on:2013-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:W WuFull Text:PDF
GTID:2248330374461175Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,the search engine has become an importantway for people to access information on the Web. People, however, not only want thesearch engine to provide the complete information resources, but also put forwardhigher requirements for the services of the search engine. Compared with the traditionalgeneral-purpose search engines, the personalized search engine has been the hotresearch topic since it can provide users with personalized services. The paper designesand implementesa personalized search engine system based on the current popularHeritrix crawler and Lucene search engine framework to achieve preferred mobileinformation from local resources search.First, the paper introduces the background and general principles of the searchengines, then studies the Heritrix crawler technology in-depth and further provides itsoptimization. As one of the main Web information extraction technologies, the Heritrixcrawler has advantages of open source and scalable, however, crawling slowly. Thepaper uses multi-threaded optimization algorithm ELFHash to increase the number ofthreads of the Heritrix crawler, to improve the accuracy of specified pages and thecrawling speed.Second, the thesis exploits the Lucene search technology as a personalized searchengine search framework, study its indexing and sorting technology and optimize itssorting algorithm. The original Lucene sorting algorithm sorts results are not accurateenough to show the relationship of the search results page with the theme, and thencannot reflect the degree of importance of crawled pages. The paper proposes a newalgorithm via the PageRank algorithm to improve the performance of Lucene sortingalgorithm by computing the correlation of user interest and the importance of the page.The experimental results show that the proposed algorithm optimized the sort resultsvery well.Third, the paper designs and implementes a search engine system which canprovide personalized mobile information services. According to the analysis of users’requirements, the system is divided into four modules including the page crawlingmodule, page parsing module, information indexing and information retrieval module. In particular, the paper describes the design ideas and implementation processes of eachmodule in detail. The experimental results show that the functionality and performanceof the system have achieved the requirements well.Finally, the paper presents the summary and future work of the research. Inaddition, it concludes published papers and achievements of the author during theperiod of graduate studying, and pointes out the disadvantages of the proposed system.
Keywords/Search Tags:Heritrix, Lucene, Personalized search engine, PageRank Algorithm
PDF Full Text Request
Related items