As Information on Internet is available in abundance, Internet is becoming a vital source of knowledge getting. But information is too much to look up valuable information efficiently. For this reason, it is very important to neaten the information on Internet. Our research focuses on Chinese Web document automatic text categorization in the information collection of focused crawling which is crawling the Web.First, the background of this task is discussed in this paper. And the primary technologies in the information collection of focuse crawling are indroduced. We designed the information collection of focus crawling model, including topic picking, initial URL picking, Spider crawling, page parsing, Chinese text splitter and text classifying. Finally, the primary function and arithmetic with java source code are discussed in this paper. Then introduce a text categorization method use in this system, Naive Bayes classifier. Finally, give the evaluation of Naive Bayes categorization method with experiences.Naive Bayes model is a kind of classifier base on rate statistics, although Naive Bayes model base on the independence assumption, but it's still a very efficient classifier. Experiment proof it's categorization veracity can attain 90%. |