Font Size: a A A

Studies Of The Hyperlink Analysis Algorithms In WWW

Posted on:2005-09-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:1118360185995651Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The emergence of WWW introduced new challenges to the traditional information retrieval (IR) technologies. Web searching involves in the theories and technologies of applied mathematics theory (such as graph theory, matrix theory and analysis), data mining, AI, NLP, etc. The core of the search engine technology is to find a better searching algorithm. From the characteristics of the Web data, hyperlinks among the web pages can be used to mine more useful information. Searching with the hyperlinks can create more effective Web information retrieval model. This dissertation studies how hyperlinks affect the Web IR theories, algorithms and applications.First, by comparing the hyperlink analysis algorithms against different data environment and retrieval requirements, I analyzed how the search results are affected by the methods to process different types of link and the methods to set the iteration rules and terminating conditions. Then I proposed restricting conditions for the hyperlink analysis algorithms in closed data set. By comparing the hyperlink distributions of the closed data set and the real Web environments, I expanded the restricting conditions to the real Web environments. In this way the effect of the algorithm can be predicated quantitatively and the experiment results show that the retrieval efficiency can be improved greatly.Then, new optimized hyperlink analysis algorithms are proposed. One of them is the Modilink. This query-independent approach introduced new preprocessing algorithms adjusting standardization methods and iterative terminating conditions. It also modified the iterative formula of PageRank algorithm to improve the whole iterative efficiency of the algorithm. The experiment results show that the Modilink can convergence faster than the PageRank algorithm and under the restricting conditions the retrieval efficiency can be improved.Other optimized hyperlink analysis algorithms are relative to the queries. Considering relationship between the web page quality and the characteristics of the hyperlink analysis algorithm, I proposed QHA1, a quality based hyperlink analysis algorithm. The core of this algorithm is to take the value from the Modilink as the web page quality factor in the...
Keywords/Search Tags:Web IR, Hyperlink, Hyperlink analysis algorithm, PageRank algorithm, HITS algorithm, Pre link analysis algorithm, Post link analysis algorithm, SVD, Bayes Network, TREC
PDF Full Text Request
Related items