Font Size: a A A

Web Page-oriented Handheld Devices Automatically Cutting Technology Research

Posted on:2010-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:M H YangFull Text:PDF
GTID:2208360275483840Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of telecommunication technology, accessing web sites via hand-held devices conveniently becomes an urgent need.but,hand-held devices, as small-screen, limited computation ability and capacity of storage, can not browse normal web page successfully.In order to solve the problem,an cutting and re-constructing web page method is put forward in this paper.First, structuring HTML documents.Second,cutting and web page cleaning,last re - constructing web page. This paper focuses on the following aspects:First,We compare the page representation method based on DOM tree with the Vision based page segmentation algorithm (VIPS) and block location based method. The VIPS define block particle size by the value of DoC. This has resulted not block particle size control. In this paper, an improved visual features of the website based on the block algorithm with block-level value of the DoC to replace the value of the original algorithm to solve the problem.Second, The use of importance sub-block approach for noise cleansing. Degree of block is considered an important block of the spatial characteristics and content characteristics, said block in the page content to reflect the importance of the theme of the mark. Important degree of block is divided into four levels, the more low-level noise block contains more and more important through the block will be a lower degree of block-level approach to delete, you can achieve the purpose of removal of noise.In this paper, the cutting and re-constructing web page method, noise cleansing method, VIPS as the core idea, the convergence of page cleaning, block importance, such as tagging technology to block web pages more accurate results.
Keywords/Search Tags:Vision based page segmentation, Web page cleaning, Web page blocking, block importance, DOM
PDF Full Text Request
Related items