Text classification and cluster are two important missions of information processing. Traditional algorithms of classification and cluster aim at pure text files, but with the development of Internet, half-struct web data become the main objects of information processing, and it makes evolution to the algorithms of classification and cluster.This paper focuses on how to achieve high precision of classification and cluster using web-mining technology compounded with existing technology. The stand of this paper is that the page's positon in the site topology shows the manager's viewpoint of content and class of the page and this information is very helpful to classification and cluster. We extract the hiberarchy class infomation of pages through web content mining and web structure mining, and use this infomation to classify and cluster the pages.
|