| To effectively organize and analyze massive Web information resource and help users to promptly get knowledge and information they need, it needs to implement Web pages automatic categorization by their contents. The prompt development of Web not only provides an unprecedented experiment environment and an application platform for text automatic categorization, but also a new challenge. Therefore, based on the traditional technologies, research work aiming at the features of Web pages is needed to develop. This dissertation explores the topic on Chinese Web page automatic categorization, which is a research issue with great significance in theory and wide perspective in application. The main contributions of this dissertation are as follows:(1) Automatic noise reduction of Chinese Web pages Compared with plain texts, Web pages are designed at will and contain plenty of noise, which affects the quality of Web page categorization. Therefore, this dissertation provides an approach to reduce noise from Chinese Web pages automatically, which makes good use of the structural information and contents of Chinese Web pages and combines Chinese Web page automatic categorization. The experimental results show that this method not only can effectively reduce noise from Chinese Web pages, but also can effectively improve the quality of Web page categorization.(2) Terms reduction algorithm of Chinese Web pagesThe one of questions which Chinese Web pages Automatic Categorization is faced with is that the terms space dimensions is too high. Therefore, this dissertation provides an approach to reduce the num of terms. The experimental results show that this method effectively improve the quality of Web page categorization. |