With the rapid development of information technology and popularity of Internet, the amount of web pages increases largely. How to classify web pages automatically by their contents becomes an important research subject in order to organize and process so large amount of data.This paper introduce some technologies with Chinese web page categorization first, and three automatic classification algorithms (Category Centroid, Naive Bayes, Support Vector Machine) that based on machine learning are explored too.Then, we implement a Chinese web page automatic classification system that base on vector space model. Studies on Chinese web page automatic classification are carried out by four experiments. The main conclusions of experiments are as follows:the linear kernel function of SVM is more suitable for Chinese web page categorization; document frequency is a rapid and efficient method for Chinese web page; the optimal number of features depends on the scale of training set and automatic classification algorithm.Finally, We propose a pre-classification algorithm that based on a given keywords list according to the characteristic of Chinese web page, and combine it with category centroid, naive Bayes and support vector machine respectively. The experimental results show that this algorithm can not only improve precision and recall but also reduce time greatly. |