Font Size: a A A

The Research And Implementation Of HTML Pages Cleanup Based On Web

Posted on:2008-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2178360212491805Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, more and more people pay attention to the information on web pages, so information extraction from the web pages has become one of the research hotspots in the field of data mining. But web pages often contain many clutters (such as pop-up ads, unnecessary images and extraneous links) that are unrelated to the subject and affect the extraction of useful information. So web page cleanup becomes very important. On the basis of deep analyses and research on the data structure of the web page and page cleanup techniques, this paper puts forward a new web page cleanup techniques based on the DOM tree, and develops a web page cleanup tool on Eclipse. This tool can effectively cleanup most of the information unrelated to the subject of page, so it has a good practicality value and useful prospect.
Keywords/Search Tags:DOM tree, page cleanup, reformatting, HTML document
PDF Full Text Request
Related items