Font Size: a A A

Research On Improving Page Ranking Algorithm Based On Time Feedback And Topic Relevance

Posted on:2022-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2518306773481304Subject:Finance and Tax
Abstract/Summary:PDF Full Text Request
HITS algorithm is a popular web sorting algorithm.However,with the increasing amount of data in the Internet,the algorithm has the problem of favoring old web pages on the one hand and the problem of page sorting quality on the other hand.Therefore,many scholars have improved the algorithm.In response to the bias towards older pages,the final queries that rank high are often pages that have been on the Internet for a long time.Considering from the time dimension,it is difficult to obtain the time parameter due to the non-standard format of the release date of the page.Therefore,this paper considers the number of times T that the crawler crawls to the webpage within the cycle,and gives the time function of the number of times T according to Newton cooling formula as the time weight.However due to the growing user search behavior in the Internet,so reference N-Step page sorting thought,takes into account the user page when the choice probability in the next Step,not only involves the page itself into the chain and chain number as well as with all of the pages of the page there is a link relationship of chain and chain out of the pages,and further to join time weight.As a weight factor into the HITS algorithm named Ti HITS algorithm(Timed HITS).Experiments under different query topics,Ti HITS algorithm was compared with the original HITS algorithm and TM-HITS algorithm.The algorithm improved the hit ratio of the latest web pages in the search results,and the average increase was 20%-40%.It can be seen that the effectiveness of Ti HITS algorithm to punish old pages.In the problem of page ranking quality,each page is assigned the same weight,resulting in unreasonable weight for pages with different relevance.Some scholars proposed the weight of page popularity and the weight of website authority,and then assign different weight values to each page.Combining the ideas of the two weights,the page with high authority value depends on its link entry,so this section proposes the weight of page authority.From the perspective of pages,the prominent index in evaluating the stickiness of web content is bounce rate,and then this paper considers the factor of bounce rate,and integrates the weight of bounce rate and the weight of page authority into the page sorting algorithm HITS.It is important to note that new pages tend to receive fewer links than old pages,and thus do not receive a higher rating.The Ti HITS algorithm proposed in the previous experiment proves that the ranking of new pages rises.Therefore,in addition to the fusion of the two weights in the proposed new algorithm,the time weight should be further fused.Finally,Ti WBRHITS algorithm(Timed Web Bounce Rate HITS)was obtained.In the experimental process of different query topics,Ti WBRHITS algorithm was compared with HITS algorithm and IHITS algorithm by crawling portal data.Experimental results show that the accuracy of Ti WBRHITS algorithm is 20%-30% higher than the above two algorithms.
Keywords/Search Tags:HITS, Authority, Hub, Time weight
PDF Full Text Request
Related items