Font Size: a A A

Research And Design On The Search Engine Based On The Enhanced Similarity Pagerank Algorithm

Posted on:2015-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:B XuFull Text:PDF
GTID:2298330452450130Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Nowadays, it is a brand-new time for big data. The internet enterprises are facingdate explosion as the information on the web is growing fast. How to search dataquickly and efficiently is becoming a key challenge for most of the large andmedium-size enterprises. Therefore, search engines have received great attention bymore and more researchers. However, for lack of specificity and its high prices, thecommercial search engines can’t meet the demand and they have to develop newsearch engines. However it will pay huge cost for companies to develop a searchengine from the very beginning. Lucene provides an excellent solution for theresearch and development for search engines. Nevertheless, Lucene can be simplyused for most of the internal search system, but not for those in the enterprises.Moreover, for losing sight of the feedback information of users and focusing more onold webpages, the sort algorithm of Lucene need to be improved.To solve these problems, in this paper, we proposed a search system which canbe used for large-sized database, provide a friendly interface and return moreaccurate results. The content of our research includes four aspects as below:(1) At first, we put forward an architecture of search engine based on Lucene.Then, we proposed a SPR(Similar Page Rank) algorithm, and researched the relationbetween user search and click-rate of webpages to solve the problem of users’feedbak. Further, to meet the demand of real-time and users’ interest degree, with thehelp of the time function and users’ interest degree function, we proposed aESPR(Enhanced Similar Page Rank) algorithm.(2) The principle of the search system based on SPR and ESPR is analyzed, andthe process of embedding the novel algorithm into the system is discussed in detail.First, we redefined the index structure after the working of analyzer and clearly statethe modulation of the whole index structure. Second, we designed the gradingmethods for ESPR which can be used for setting up an index for search.(3) We research the consist of the system and programed it based on ESPR. Thegrading module can be user-defined, as well as the index structure. Lucene has a good encapsulation and inheritance, so we can modify the program in a custom scoringmodule. Index changes due to the structure of the algorithm can be redefined in therespective module.(4) Filially, we did some experiments to verify the search engine. I compare thesearch engine based on the ESPR algorithm with other two search engines. Theresulet is from the technical team combines five people. It shows that our searchengine has a better search-accuracy.Own to a better search-accuracy and low cost, the ESPR algorithm proposed inthis paper can apply to most of the Internet enterprises to search data accurately andquickly. Besides, it is also suitable for the traditional enterprises which will join theworld of Internet.
Keywords/Search Tags:search engine, Lucene, PageRank, ESPR, precision
PDF Full Text Request
Related items