Text Classification Based On Gravitation Field Model

Posted on:2013-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2248330362474399

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of IT industry, especially the universal application of theInternet, information processing has become a critical technology to help to obtainuseful information, and the automatic text classification technology, which canautomatically assign a text document to the predefined categories based on the contentsof the text, is an important research topic for information processing.This paper describes, firstly, the framework of Chinese automatic text classificationsystem, and then introduces several techniques associative with Chinese textcategorization, and then sets out several classic text classification methods with theiradvantages and disadvantages, and finally, this paper gives a new method for Chinesetext classification.Inspired by the gravitation field, this paper designs a new method, Virtual Kernel(VK, for short), for the task of text classification under the gravitation field model. Themain idea of the method is: firstly, in the training stage, the target is building aclassification model by obtaining the “category virtual kernel” for each single categorythrough computing the field strength, the specified mathematical transform of termfrequency, at each term point from the labeled texts in the training set; and secondly, inthe test stage, when an unlabeled text comes, this method need compute, according tosome rule, the attractive force of each category virtual kernel to it; and finally, thismethod assigns the unlabeled text to the class which has the most strong attractive force.By its very nature, this approach automatically assigns an unlabeled text to somecategory according to the relationships between text features and the predefinedcategories.In order to verify utility of the proposed approach VK, this paper has done somewell-designed experiments, in which, using vector space model to represent texts,comparing VK and the two classic text classification methods-kNN and Naive Bayeswith two feature selection methods–DF and IG, respectively. We do these experimentson two corpora respectively, and draw some meaningful conclusions:1) VK is superior to kNN and Naive Bayes both in terms of time and classificationperformance.2) VK can still show satisfied classification results on the non-equilibrium corpus.3) VK classification has no strong dependence to the size of the training set. 4) On the term of feature selection method alone, IG is superior to DF.5) The quality of the corpus can make direct effects on classification results.

Keywords/Search Tags:

text classification, feature selection, vector space model, gravitation fieldmodel, virtual kernel

PDF Full Text Request

Related items

1	On Research For Chinese Automatic Text Categorization Technology Based On VSM Model And Feature Selection
2	Research On Feature Selection Of Text Classification
3	Research Of Text Categorization Based On Vector Space Model
4	Research On KNN Text Classification
5	Research On Text Emotion Classification Based On Improved Feature Selection Method
6	Research And Implementation Of Text Classification Technology Based On Bayesian Theory
7	Study On Feature Selection And Feature Weighting Of Text Classification
8	Study On Some Chinese Text Classification Technology And Applications In Knowledge Extraction
9	Chinese Text Data Classification
10	Automatic Classification Research On Chinese Web Document Orientation