Font Size: a A A

The Refresh Strategy For Webpage Of Large-scale Website In Search Engine

Posted on:2011-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:X YiFull Text:PDF
GTID:2198360332470011Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet nowadays, network becomes an indispensable means to get information. The search engines based on Internet information retrieval occurs and develops. However, as the number of pages soars, the content of pages changes frequently, which results that the search engine can't track the dynamic content of pages. And with the limit of storage capacity, server bottleneck and other hardware resource constrain, there exist several problems such as the index database cannot be updated timely, and the quality of query result is not ideal. Thus, how to design an efficient update strategy for pages becomes a key problem for extraction of high quality pages and improvement of pages'freshness.Large-scale web site is the core of information and the main source of search engines. Whether to deal with the large scale web site effetely or not has direct impact on the overall performance of search engines. In this paper, efforts are made to improve the freshness of index database by efficient refresh of large-scale web pages.Based on the interrelated research of the pages refresh strategy, the article embedded analyzed and compared three categories strategy which have been kept, and confirmed the necessary for classification update the page of large-scale web site. Contrapose the feature of large-scale site, this article considered such factors as the importance and freshness of Web page, the friendship with the Web servers. Then designed a new classified refresh strategy which orieted user experience. This strategy to assessed the change of the page of large-scale site in frequency by page's historical changes, divided this pages into three categories such as rapid change, fast change and slow change. Then based on user behavior analysis to determine the speed of different categories of web page updates and update time, in order to achieve page refresh.Finally, a page-categorizing refresh routine is designed and implemented on the foundation of Lucene tool kit, which is then used to sample and analyze some pages from two well-known websites, Sina and Sohu. The result proves that the strategy greatly improves the efficiency of page update, releases the pressure of Web servers and guarantees the betimes and correctness of search result.
Keywords/Search Tags:Large-scale website, Search engine, Page refresh, User's experience
PDF Full Text Request
Related items