Font Size: a A A

Research And Implementation Of Search Engine Ranking Algorithm Based On Nutch

Posted on:2017-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LiFull Text:PDF
GTID:2348330491962650Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Due to the explosive growth of Internet information, how to accurately find the required information on the numerous web dates has become an attractive issue. The search engines provides an effective tool for related information retrieval.Search engine orients to users, therefore how to provide better service become the motivation of optimization of search engine. Facing with a flood of Internet information, users often choose the first few pages of the search results to browse, so the quality of service of search engines depends largely on the page ranking algorithm. Thus the emphasis on search engine optimizations is the page ranking algorithm optimization.Currently the most widely used sorting algorithms are the PageRank algorithm and the HITS algorithm. Because of its high computational efficiency and larger amount of data calculation PageRank algorithm is more commonly used. However, PageRank algorithm only considers the link structure of the page during its iterative process, so it has deficiencies in emphasis on the old web pages, theme offset and unreasonable weight distribution. In order to improve the accuracy of the page ranking algorithm, an improved algorithm named WCT-PageRank which is based on the depth research of PageRank algorithm is proposed by adding the PR allocation factor, page-related factor and time factor.Nutch is an excellent open source project.Due to its plug-in mechanism, the experiment selects it as a development platform for secondary development. Because the ineffective of Nutch in Chinese word segmentation, adding IKAnalyzer to improve the segment result, and integrating PageRank algorithm and WCT-PageRank algorithm. Experiments are done based on the customized search engine and the experimental results show the proposed WCT-PageRank compared to the PageRank algorithm has higher precision.
Keywords/Search Tags:Search engine, Ranking algorithm, PageRank, WCT-PageRank, Nutch
PDF Full Text Request
Related items