Font Size: a A A

Research On Hierarchical Classification For Chinese Text And Its Application In Tang Poetry Classification

Posted on:2007-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:X XiaoFull Text:PDF
GTID:2178360185974523Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As one of large-scale information processing application technologies, text classification has noticeable importance. In existing majority of text classification methods, regardless of two-class classification or multi-class classification, categories all occupies the identical level, namely are in the same plane class space. On large amount conditions of text quantity, plane classification performance can receive the very big restriction. But a fact is discovered that some categories have the general character comparing to other category together, may compose a category set. Thus hierarchical classification concept is put forward. Hierarchical classification is the taxonomy structure which organizes categories in tree structure according to the certain hierarchical relations. Therefore, from the structure to performance, hierarchical classification is a big improvement, and it is an effective classification method.This paper has conducted some research on Chinese hierarchical text classification method. The main research content includes following two aspects:1. Proposing a new method of Feature Dual-Selection (FDS) and an algorithm of Hierarchical Text Classification (HTC) based on vector space model. Aiming at structural characteristics of hierarchical text classification, this paper takes into account various demands of texts in different levels on both feature selection and categorization method, and further proposes a new method, Feature Dual-Selection (FDS), together with an algorithm of Hierarchical Text Classification (HTC) based on vector space model. Because some features which have more contribution in certain level are not necessarily to be important in another. In order to reveal the importance of each feature in the different level structure, therefore FDS is to perform feature selection in each level, and set up a weight coefficientδ, and then modify feature number along with Term Weighting method accordingly. Giving up the general hierarchical classification algorithms which only use single method, HTC algorithm integrates together center classification method and Support Vector Machine (SVM), which behaves more effective for broad classification and subdivision respectively. HTC regulates the choice of the classification method by difference threshold valueα, reaches the optimization of the classification method.
Keywords/Search Tags:Text Classification, Hierarchical Text Classification, Vector Space Model, Maximum Entropy Model, Tang Poetry
PDF Full Text Request
Related items