tags and style sheets. Web designer likes to put the same semantic contents into a
block and control the layout of the
block by the style sheets. The technique iscalled"DIV plus CSS". Based on this observation, a web page is first partitioned into severalblocks using DSS_DOM. Secondly importance values are assigned to all the blocks using anevaluation algorithm. The algorithm involves the information of style sheets and the structureof DSS_DOM. The contents in low-importance-value-blocks are not-related-contents.DSS_DOM identifies the basic data unit by the structural features and semantic featuresand determines the logical structure of web pages. The algorithm based on DSS_DOMestimates importance of DIV blocks and identifies the not-related- blocks.The proposed technique is evaluated with two data mining tasks, Web search engine andWeb page classification. Experimental results show that our noise elimination technique isable to improve the mining results significantly.