Font Size: a A A

The Research Of Personalized Search Based On Improved Pagerank Algorithm And User Interest

Posted on:2015-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:M F ZhangFull Text:PDF
GTID:2298330452994322Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology and the rapid increase of webinformation,the network has become an inportant way for people to botain information.User’s desire of quickly searching information,forcing traditional search engines to theintelligent and personalized for reform. The personalized search engine that can sense userintention and meet the personalized demand of users. Therefor, this article designe apersonalized search system based on PageRank algorithm and user interest model.Firstly, aiming at the four defects of PageRank algothrim.An improved PageRankalgorithm based on web similarity, click traffic trends, authority, and the time factors wasproposed. On the base of page segmentation,analyzing position lable,anchor text and vectorspace model, estimating similarity,improving theme drift. Evaluating the trends of webdevelopment through analysing click and click growth rate. In order to affect PR valuetransfer and prevent web cheating,calculating the authority of web through evaluation of thestation and external links. Eliminating the prejudice against new page by timecompensation factor, so that the new pages and old pages can get weights which mach theiractual value.Secondly, analyzing the user’s registration information, favorites, history records,building user interest model based on improved VSM and hybrid way of modeling. Themodel has two updating mechanism. Inregular updated model is a method when a page isadded to the favorite. Regularly updated model is based on the curve of Ebbinghaus.Updating the terms of user model once in a while.At last, analysing the work flow of open source engine nutch.Implementing secondarydevelopment based on nutch.Adding uers interest module in nutch. And the improvedPageRank algothrim is used to replace the original sorting algorithm in nutch.Using nutchcrawl large number of pages as the experimental data to test and relecant comparisons.Experimental results show that, compared with the traditional PageRank algorithm, theimproved algorithm has a higher accuracy rate. The personalized search system based onimproved PageRank and user interest can better meet the user’s personalized requirments.
Keywords/Search Tags:PageRank, personalized search, similarity, click traffic, trendsauthoritative, time factors, user interests
PDF Full Text Request
Related items