Text Categorization is the process of automatically assigning new documents to a set of predefined categories. Although many statistical approaches have been applied to text categorization, there are still needs for understanding the strengths and weaknesses of individual methods and looking for ways of combining them for improved performance. This thesis makes a number of improvements for hierarchical text categorization, including data analysis for detailed comparison of four major categorization methods, new ways of combining features across multiple categories, a more efficient training method for K-Nearest Neighbors, data smoothing for Maximum Entropy Modeling, and different ways of combining multiple text categorization methods. |