Research On SVM And Text Classification

Posted on:2007-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:X X Niu

Full Text:PDF

GTID:2178360182480726

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid growth of the Word Wide Web, the task of classifying natural language document into a predefined set of semantic categories has become one of the key methods for organizing online information. This task is commonly referred to as text classification. The exponential growth of the number of online documents and the increase pace with which information needs to be distributed has created the need for automatic document classification.The approach presented in this thesis is based on the key sight that margin, the complexity measure used in support vector machine (SVM), is ideal for text classification. The learning algorithm is given access to the labeled training documents and produces a classification rule automatically. The main work is as follows:1. Based on the introduction of representations of text, feature selection, and criteria for evaluating predictive performance, we implement the processing steps like stemming, high and low frequency words removal, and weighting schemes to generate our feature dictionary and transform training and testing documents into numerical vectors. Then the text classification experimental system based on SVM is designed. Tested on Ruters-21578 corpus, the system demonstrates that SVM can efficiently, effectively and provably solve the challenge of learning text classifiers from examples for a large and well-defined class of problems.2. In order to solve overfitting and time consuming for training in SVM, SVM combining subtractive clustering method is proposed in this thesis. Subtractive clustering method is used to select a set of cluster centers which are the data samples themselves as the representation of original massive set of training data. The new training set then is used to construct support vector machines. Two benchmarks on two-class recognition and multi-class problem are tested, and the results show that the SVM based on subtractive clustering have better or equal classification accuracy and generalization ability with smaller set of training data and cost less optimization computation time than conventional support vector machines.

Keywords/Search Tags:

Text classification, Feature selection, Support vector machine, Subtractive clustering

PDF Full Text Request

Related items

1	Chinese Text Classification Based On Svm Algorithm Realization
2	Research On Text Classification Based-on Support Vector Machine
3	Research On Text Classification System Based On Support Vector Machine
4	The Design And Application Of SSVM's Text Classification Based On Feature Selection Optimization
5	Research On Web Text Classification Based On Support Vector Machines
6	Research On Chinese Text Classification System Based On Support Vector Machine
7	Study On Text Classification Based On Rough Set And Support Vector Machine
8	Research On Text Classification Method Based On Support Vector Machine
9	Research On Text Classification Based On Feature Selection And Its Application
10	Text Classification Based On Machine Learning