The Research And Implementation Of HTML Pages Cleanup Based On Web

Posted on:2008-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2178360212491805

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet, more and more people pay attention to the information on web pages, so information extraction from the web pages has become one of the research hotspots in the field of data mining. But web pages often contain many clutters (such as pop-up ads, unnecessary images and extraneous links) that are unrelated to the subject and affect the extraction of useful information. So web page cleanup becomes very important. On the basis of deep analyses and research on the data structure of the web page and page cleanup techniques, this paper puts forward a new web page cleanup techniques based on the DOM tree, and develops a web page cleanup tool on Eclipse. This tool can effectively cleanup most of the information unrelated to the subject of page, so it has a good practicality value and useful prospect.

Keywords/Search Tags:

DOM tree, page cleanup, reformatting, HTML document

PDF Full Text Request

Related items

1	The Research And Implementation On Web Page Segmentation
2	A Web Structure Clustering Algorithm For Mobile Page Adaptive Platform
3	Research Of Conversion From HTML Web Based On Contect Personalization
4	Research Of Web Page Purifying Method Based On Document Object Model
5	Study On The Tag-based Analysis Technique Of Extracting The Body Of The Page
6	Research On The Technology Based The Realization Of WAP Page Conversion
7	Automatic Classification Research On HTML Document And Implentation Of The Tool
8	Research On Mining Structure Of WEB Page For Information Extraction
9	Studies In Algorithms For Page Segmentation And Classification Of Document Images
10	Design And Implemention Of Html5Based PC Web Page Transformation System