Font Size: a A A

Research On Stochastic Models And Algorithms For Web Page Ranking

Posted on:2010-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y T LiuFull Text:PDF
GTID:1118360278452591Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
As the World Wide Web grows rapidly, search engines have become the most popular tools to access the large volume of information from it. And the key factor of the search engine is the ranking model of Web pages, which contains two types: static rank and dynamic rank. In past, different approaches have been designed to solve these two problems separately. In this thesis, we analyze them on the same base - stochastic process model, and design new algorithms to solve them efficiently and effectively.Firstly, we establish a framework on Markov skeleton process to compute the page importance by investigating real browsing behaviors of users. We find that the importance of Web pages on users' view is determined by two factors: visiting frequency and the length of the staying time on them. And we prove that the stationary distribution of the Markov skeleton process (it exists under special conditions) can take these two factors together into the account. Within this framework, we design a group of eight new novel algorithms all referred to as BrowseRank, to compute the page importance based on the continuous-time time-homogeneous Markov process, which is one of three special cases of the Markov skeleton process. And from the experimental results, we find BrowseRank outperforms other baseline algorithms, such as PageRank and TrustRank. It is testified that stochastic process model is more effective and efficient for page importance calculation. More important, The first one of these eight algorithms was presented in SIGIR' 08, and now it has become the most popular algorithm in the field of commercial search engines. Such innovatory algorithm was evaluated by various reportages as the next PageRank and will impact the progress of search engines highly, such as "Microsoft tries to one-up Google PageRank" by CNET.com, "BrowseRank: The Next PageRank, Says Microsoft" by WebProNews and so on.Secondly, we build a supervised framework for rank aggregation based on Markov Chain. Within this framework, we not only generalize some unsupervised algorithms to supervised ones, but also design a new approach referred to as Supervised MC2 for rank aggregation, which transform the original NP-hard problem to a semi-defined programme. They justify that it is necessary to apply supervised learning method in aggregating rankings, and it is effective by using Markov model.Based on these two applications of stochastic process model in Web page ranking, we conclude that it is reasonable to use stochastic process to stimulate the ranking pro- cess, because 1) the transition between states in stochastic process can model the relationship among Web pages accurately, 2) the ranking process should arrive at a balance by considering as many as possible relationships among Web pages, while for some stochastic process, especially Markov process, the stationary distribution describes an invariable measure on all states. Therefore, this thesis makes progress in Web page ranking by leverage the stochastic models.
Keywords/Search Tags:Information retrieval, Web page ranking, static rank, dynamic rank, rank aggregation, Markov skeleton process, BrowseRank algorithm, supervised learning
PDF Full Text Request
Related items