Font Size: a A A

Research And Improved Of PageRank Algorithm In Web Data Mining

Posted on:2015-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:W F PingFull Text:PDF
GTID:2268330425985470Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, the ways of user access to information is more and more convenient. At the same time, how to obtain useful information becomes a problem, while facing the huge and complex information. Fortunately, Web data mining technology provide a solution to solve the problem of information overload on the Web. Web structure mining based on the hyperlink analysis, obtain useful information from the link structure, re-structure organization to make contents of the logical structure more reasonable. Therefore, Web data mining has become a research hotspot now.The emergence of PageRank provides a good idea for us."Each link represents a Web page authors pointed to an independent accreditation" which as a precondition of the classical PageRank algorithm. In this thesis, we not only program the iterative process of the PageRank algorithm. but also to focus on discussing the effectiveness of PageRank algorithm evaluated the webpage quality. We carefully analyzed the deficiency of research which Fricke use the World Wide Web question answering as samples and proposed our optimization method.Because the evaluation of webpage quality usually contain the personal view, we also formulated the webpage information of quality assessment criteria to limit the subject evaluation of personal. Lastly, the experiment proves the validity of our PageRank optimization method which is used to evaluate the quality of webpage.It is generally known that the search engine algorithm should put the topics which users most needed to the top-ranking as much as possible. Traditional PageRank algorithm exist the problem of topic drift which affects the results of research. Based on a large number analysis of webpage sorting algorithm, this thesis proposed an improved algorithm which based on topic hyperlink similarity. The cosine similarity of Web link relation vector describes the theme correlation of webpage in network. It avoids the burden of additional text information in other improved algorithm. Experimental results show that, TLSPR algorithm does not require additional space, and does not increase the time complexity of algorithm. It can rank the result which user satisfied on the front of list, so it improved the effect of search results and avoiding the problem of topic drift.
Keywords/Search Tags:Web Data Mining, Structure Mining, PageRank Agorithm, Topic Drift
PDF Full Text Request
Related items