Font Size: a A A

A Novel Page Ranking Method Based On Analyzing The Diversity Of Network Structure

Posted on:2015-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:H C ChenFull Text:PDF
GTID:2268330428496147Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the application of search engine isincreasingly extensive and the performance of page ranking algorithm plays a key role indetermining the ability of search engine and the search experiment of users. The existingmethods of page ranking and spam detection merely consider the number and the quality ofinbound hyperlinks, while the spam pages in link farms are still contain a large number ofinbound hyperlinks. In this case, the biggest challenge that ranking algorithms confront ishow to distinguish spam pages and authoritative pages. In addition, the current onlinenetwork structures are dynamically change with time, while the damping factor whichcharacterize the probability of random jump of users browsing behavior is static. Thus howto modeling the browsing behavior in dynamic networks more vividly is another challenge.Based on the aforementioned challenges, in this paper, we propose two novel rankingalgorithms which summarized as following:(1) We propose a ranking algorithm based on analyzing the diversity of inboundhyperlinks. This algorithm considers that the sources of inbound hyperlinks can objectivelyreflect the authority of web pages. Comparing with real authority page, which has a largenumber inbound hyperlinks from a wide variety sources, the page whose rank is improvedby cheating method often don’t have the characteristic of wide diversity of their inboundhyperlinks. Specifically, we use the "social circle" concept of sociology to calculate thek-nearest neighbors for each page, and compute the page source diversity based on theoverlap of k-nearest neighbors between two pages. Then by analyzing the structuredifference between authoritative pages and web spam pages, we propose two link weightadjustment strategies to deal with child node manipulation and link exchange, respectively.Finally, we propose a PageRank-like ranking algorithm, called Drank, to calculate the valueof authority for pages. Our experimental results against several benchmark data sets showthat Drank has the best performance in terms of both finding high-quality pages andsuppressing web spams.(2) We propose another ranking algorithm based on analyzing the users’ browsing behaviors. This algorithm considers the probability of a user randomly jumps from currentpage to another to be variable, and the probability is inversely proportional to the number ofoutgoing links of current page, at the same time, the probability of a random jump to anotherweb page is proportional to its authority value. Specifically, we modeling the user browsingbehavior based on the aforementioned ideas, in this model we compute a specific randomjump probability for each page based on the features of its link structure. In order to combatweb spam, we propose two link weight adjustment strategies by analyzing the link structureof child node and target page in link farm. Finally, based on a random walk model wepresent a parameter free method to calculate the authorities for web pages. The experimentalresult on artificial dataset and real dataset shown that, our method has a better performancein terms of both ranking authority pages and suppressing web spams in the case of noparameters and no available priori information.
Keywords/Search Tags:Search engine, Ranking algorithm, Random walk, Browsing behavior, Hyperlinkanalysis, Spam detection, Probabilistic counting
PDF Full Text Request
Related items