Font Size: a A A

Study On Creation Technique For Block-based Web Archive

Posted on:2010-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:C F YuFull Text:PDF
GTID:2178360308478202Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity and the rapid development of Internet, the World Wide Web has accumulated a large amount of information resources. As a huge resource base and knowledge base, Internet has becoming more and more closely with people's lives. Through the World Wide Web to read news, write blogs, find information, and so on, has become an essential part of daily life. However, as time goes by, the World Wide Web is changing quietly and it's size is increasing continuously, in which part of the content has disappeared, been replaced and appended.While World Wide Web is growing, some content of Internet is disappearing gradually, as some out-of-date web pages, personal blogs and so on. This is divided into two cases disappeared, one is that the server where the Web page on is no longer providing services, and the secondly, these old elements have been replaced, but the disappearance of the two may be permanent. In view of this situation, in recent years, many institutions began to research and to establish their own Web Archive System, and will continue to expand its application to make it become a growing knowledge base, a Web-history museum.Web Archive System collects a large number of Web pages that ever exit on Internet, and processes it for use and research in the future. Its significance is that it can preserve a large mount of Web pages that will disappear as time goes by, provide a more comprehensive data sources for research for the Internet, and can form new Web pages similar to the original ones, expand further application through a more in-depth research based on it.For this reason, a Web page segmentation Algorithm that faced Web Archive System and a Block based Web Archive System prototype is proposed in this thesis. The prototype can detect block-level changes in Web pages, achieve the block-level incremental storage, and provide data resource for querying and studying Web historical pages.Comparing with traditional Web Archive System based on total Web page, the method in this thesis is based on block, so that the version comparison and storage are processed in block-level, and overhead is reduced. The experimental results show that the proposed creation technique for Block based Web Archive is feasible and effective.
Keywords/Search Tags:Web Archive, Historical Page, Web Page Block, version Comparison
PDF Full Text Request
Related items