Font Size: a A A

An empirical study on hierarchical text categorization

Posted on:2009-05-01Degree:M.ScType:Thesis
University:University of Guelph (Canada)Candidate:Wang, WeiFull Text:PDF
GTID:2448390002995684Subject:Computer Science
Abstract/Summary:
Text Categorization is the process of automatically assigning new documents to a set of predefined categories. Although many statistical approaches have been applied to text categorization, there are still needs for understanding the strengths and weaknesses of individual methods and looking for ways of combining them for improved performance. This thesis makes a number of improvements for hierarchical text categorization, including data analysis for detailed comparison of four major categorization methods, new ways of combining features across multiple categories, a more efficient training method for K-Nearest Neighbors, data smoothing for Maximum Entropy Modeling, and different ways of combining multiple text categorization methods.
Keywords/Search Tags:Text categorization
Related items