Classification System Based On The Theme Of Information Acquisition In The Pages

Posted on:2007-05-01

Degree:Master

Type:Thesis

Country:China

Candidate:X R Wan

Full Text:PDF

GTID:2208360185453699

Subject:Computer application technology

Abstract/Summary:

As Information on Internet is available in abundance, Internet is becoming a vital source of knowledge getting. But information is too much to look up valuable information efficiently. For this reason, it is very important to neaten the information on Internet. Our research focuses on Chinese Web document automatic text categorization in the information collection of focused crawling which is crawling the Web.First, the background of this task is discussed in this paper. And the primary technologies in the information collection of focuse crawling are indroduced. We designed the information collection of focus crawling model, including topic picking, initial URL picking, Spider crawling, page parsing, Chinese text splitter and text classifying. Finally, the primary function and arithmetic with java source code are discussed in this paper. Then introduce a text categorization method use in this system, Naive Bayes classifier. Finally, give the evaluation of Naive Bayes categorization method with experiences.Naive Bayes model is a kind of classifier base on rate statistics, although Naive Bayes model base on the independence assumption, but it's still a very efficient classifier. Experiment proof it's categorization veracity can attain 90%.

Keywords/Search Tags:

Focused crawling, Spider crawling, Chinese text splitter, Chinese text categorization, Naive Bayes Classifier

Related items

1	Chinese WEB Document Automatic Categorization
2	A Study On Chinese Text Automatic Categorization
3	Research And Application On Web Crawling And Text Mining Technology
4	Research And Implementation Of Focus Crawling Spider Based On A. T. C And Optimzied Hyperlink Chosen Strategy
5	A Study On Chinese Text Categorization
6	The Study Of Chinese Text Categorization Based On Concept
7	The Extension Language King Figure Focused Crawling Extractor Experimental Studies
8	Research On Chinese Text Classifier Based On Probability Method
9	The Study Of Chinese Text Categorization Based On Na(?)ve Bayes
10	Study And Realization Of Text Categorization In Chinese Speech Recognition Results