| With the rapid development and popularity of the World Wide Web, we have now entered a digitalized age where information is extremely abundant, from the old age in which useful information was lacking. Facing the vast amount of online information, it is difficult for us to find the useful information quickly and effectively. Thus, how to organize and manage the huge amount of online information has become an important research topic.With the rapid increase of information on the Internet, online information are dealed with manually which is time-consuming. There are abundant texts infromation on the Internet. Also, Web texts categorization has become one of the important tools for acquiring information on the Internet.The main contents of the thesis are as follows.(1) The key techniques for Web news texts categorization, including document representation, feature selection, classification, and their difficulties are discussed and studied.(2) A method for recognition of Chinese named entities is proposed, based on rules and statistics. The recognition method of the Chinese Named Entities through constructing internal and external rules and adopting the statistics method.The expriments proved that this method gains higher precision and recall.(3)The topic finite effect of the Chinese named entities elements of news for Web news texts, and these named entities of news are applied in the recognition of Web news texts on topic, the experimental results demonstrate that the method gains better recognition effect.(4) The model of news Web text based on SNE (sotry of named entites) is proposed. The classifiers are constructed based on SNE for automatic recognition and classification of news Web texts. The experimental results demonstrate that the classifier provides a high accuracy in Chinese Web texts. |