Font Size: a A A

The Research And Implementation On Web Page Segmentation

Posted on:2013-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2248330392456882Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, Web has become the largest information source.Yet, Web makes people face a huge challenge in looking for useful information efficiently,while offering a wealth of information. Most information retrieval systems on the Webconsider web pages as the smallest and undividable units. However, a page usuallycontains multiple topics that aren’t much relevant to each other, and various contents suchas navigation, decoration, contact information, etc. So a web page as a whole may not beappropriate to represent a single semantic. On the other hand, with the development ofcommunication technology, handhold mobile devices, such as PDAs, smartphones,improve rapidly. How to display web pages on small screens of handhold mobile devicesproperly is another valuable topic. Web page segmentation offers a solution to thoseproblems.In this paper, through the detailed analysis of popular methods of Web pagesegmentation, advantages and shortcomings of every kind of methods applied in the realsituation are concluded firstly. Based on previous researches, a new method of Web pagesegmentation is proposed and implemented making use of various cues. It takes severalaspects into consideration, including general layout patterns of web pages, visualinformation of web pages, and structural information of pages’ tag trees. Suchsegmentation matches how and what people segment web pages into, preserves the overallstructure of web pages, and makes it easier for extraction and production of subblocks’information later on.The experimental results show that the new method can segment general web pagesefficiently, and it overcomes the insufficiency of some existing approaches. We furtherapply it to offer extra important information with pieces that follow the general layoutpatterns in the application of web page transformation especially for handhold mobiledevices, which gives the new approach of web page segmentation good application value.
Keywords/Search Tags:Web page segmentation, HTML DOM, visual cues, tags, layout patterns, string tree
PDF Full Text Request
Related items