Font Size: a A A

Research Of Text Categorization Based On Vector Space Model

Posted on:2007-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:L H SuFull Text:PDF
GTID:2178360182977806Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Text is the main information carrier on Internet. Text auto-categorization system can organize and manage the text information availably, locating the information accurately and rapidly, supporting the information extracting effectively.The paper researches in detail into the technology of text classification based on vector space model. Term weighting, term selection and classifier construction, the keys of text classification system, are introduced and analyzed in the paper. The paper discusses the traditional algorithm of term weighting: TF-IDF, the introduction of inter-class factor in term weighting is presented. In feature selection, the paper indicates the reason that mutual information (MI) shows low precision in classfication and proposes the improved algorithm. Experimental results show that the improved algorithms outperformed the traditional methods in classification precision. In the experiment, four different classifiers are considered and the multi-hierarchy text classification based on Rocchio is performed.
Keywords/Search Tags:Text Categorization, Vector Space Model, Term Weighting, Feature Selection, Hierarchy Classification
PDF Full Text Request
Related items