Research Of Hierarchy Document Classification

Posted on:2008-06-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Li

Full Text:PDF

GTID:2178360212993742

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the blooming of Internet information, the information-processing is becoming, more and more, a necessary tool for people to have access to useful information. Text classification system is one of the most important research areas which classify texts to classes according to the content of the texts under given classes system. Since 1990s, Internet has been in such a dramatic increase that it contains huge amount of raw information including text, sound, and image. How to achieve the most virtual information in the huge and disordered text information is one of the objects of information-processing. Recently, Text Automatic Classification, which has been mixed with search engine, information pushing, sending, and filtering, has improved information service effectively.Text Automatic Classification is the problem of automatically assigning predefined categories to free text documents. From the beginning to now, Text Automatic Classification has experienced the period from rule-based to statistics-based and now it has been developed into the phrase which mixed both the rule-based method and the statistics-based method.Following contents are included in this paper:First of all, we make a general introduction about the concepts, methods, categories and applications of document classifying. We design and achieve a simple document classifying system.Secondly, we proposed a hierarchy text classification model according to the shortage of the traditional methods. In this approach, all classes are organized as a tree according to some given hierarchical relations. The task of classification is divided into some sub-task corresponding to hierarchy structure. The predefined topic categories are organized hierarchically, in the hierarchy, each internal node has a classifier which is trained on the samples. Through these hierarchical classifiers, new documents are classified in to one leaf node of the hierarchy beginning from the root. In other words, all classes are organized as a tree according to some given hierarchical relations, and all the training documents in a class are combined into a class-document. In order to construct the class models, it is just only to compare among the class-document attached to the same node of the same layer. When it is going to classify the documents, one matching process is hierarchically performed from the root node to the leaf nodes until a corresponding subclass is found.Last, the experiments show that the classification precision of this method has been closed to the traditional ones and it can enhance the efficiency of document classification greatly.

Keywords/Search Tags:

text categorization, hierarchy, accuracy, efficiency

PDF Full Text Request

Related items

1	The Research Of Automatic Text Categorization System Based On Neural Networks
2	A Study On Text Categorization Based On Machine Learning
3	Research Of Text Categorization Based On Vector Space Model
4	Research Of Hierarchical Text Categorization System Based On VSM And Rule Matching
5	Study On Text Categorization Method Based On Support Vector Machine
6	Research On Text Categorization Based On LDA And SVM
7	Research And Implementation Of Chinese Text Categorization Methods Based On Tree-like Keywords Set
8	A Study On M3-kNN Network And Application In Text Categorization
9	Text Categorization Research Based On Support Vector Machine
10	Studies On Some Essential Problems In Automatic Text Categorization