Font Size: a A A

Research And Application Of PageRank Algorithm In Web Mining

Posted on:2021-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2428330611997692Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer network technology,users have more and more ways to obtain information,but in the face of huge information resources,how to efficiently and accurately obtain useful information for themselves has become a problem.In web structure mining,by analyzing the link relationship between web pages and combining user search topics,it can provide users with more comprehensive and accurate information.In this paper,we take the Page Rank algorithm of Web structure mining as an object,make in-depth research on its mathematical model and practical application,point out the problems of topic drift and the emphasis on old web pages,and propose improved algorithms.Search results.The main contents of this article are as follows:(1)First of all,research on Web data mining and search engines,introduce the research background and development trend of Web data mining,introduce their respective application scenarios,development status and advantages and disadvantages;introduce the principles and application process of search engines Wait.(2)Aiming at the shortcomings of topic drift,this paper proposes a BM25 probabilistic retrieval model based on IDF term frequency calculation and binary retrieval model.This model is different from traditional cosine similarity calculation.In the process of calculating the correlation between keywords and documents,Has the advantage of more flexibility and efficiency.(3)In view of the lack of emphasis on old webpages,this paper introduces a time feedback factor and uses the number of cycles searched by search engines to replace the publishing time of webpages.This avoids the problem of inconsistent publishing time acquisition rules due to differences in webpage structure.Effectively provide compensation for new high-quality web pages.(4)According to the work of(2)(3),an improved Page Rank algorithm is proposed.In order to verify the advantages of the improved algorithm,the original webpage is crawled by the web crawler tool Nutch,preprocessed and stored in the database as a data set.Finally,experiments are performed using the original Page Rank algorithm and the improved Page Rank algorithm to verify the effectiveness of the improved algorithm.
Keywords/Search Tags:Web mining, Page Rank, BM25 model, Time feedback
PDF Full Text Request
Related items