Research Of Text Categorization System Based On SVM

Posted on:2009-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhang

Full Text:PDF

GTID:2178360248953618

Subject:Measuring and Testing Technology and Instruments

Abstract/Summary:

PDF Full Text Request

A great deal of electronic text information comes forth with the development of Internet. How to obtain useful information quickly and efficiently by computer has become a hotspot. The system of automatic text classification makes it easy. Automatic text classification is the important content of information processing, it is used in the field of text identification, E-governance, search engine and information filtration. Elevating the accuracy rate is very significant for its applications.This paper realizes a system of text classification based on Support Vector Machine (SVM). Compared with traditional method of classification, SVM shows many attractive features and emphatic performance in the fields of small sample, nonlinear and high dimensional pattern recognition. SVM is under the principle of structural risk minimization and has best overall solver.Based on the information of small-sample learning, SVM searches the optimal solution between the complexity and learning ability of model, so it can achieve best outreach capacity and solve the overfitting problem effectively. Classifier based on SVM can be provided with good outreach and high accuracy rate even with small sample.This paper introduces the basic process of Chinese text classification and primary technology such as text information expressing and feature selection, mostly refers to the algorithm of SVM classifier, analyses the elements that influence result and compares the classification results of different kernel functions. We makes a text categorization system based on SVM come true, this classifier can achieve multi-category classification. In the part of text preprocessing, we use ICTCLAS system to segment words, and we combine Document Frequency (DF) with Information Gain (MI) to select the feature. This method can avoid the disadvantages of DF and MI. Not as usual method, we use grid-search to optimize the parameters of kernel function. In the end the experiments show that this improved system can achieve the better result and higher accuracy rate.

Keywords/Search Tags:

Text Classification, Support Vector Machine, Feature Selection, grid-search

PDF Full Text Request

Related items

1	Research On Text Classification Based-on Support Vector Machine
2	Research On Text Classification System Based On Support Vector Machine
3	The Design And Application Of SSVM’s Text Classification Based On Feature Selection Optimization
4	Research On Web Text Classification Based On Support Vector Machines
5	Research On Chinese Text Classification System Based On Support Vector Machine
6	Study On Text Classification Based On Rough Set And Support Vector Machine
7	Research On Text Classification Method Based On Support Vector Machine
8	Research On Text Classification Based On Feature Selection And Its Application
9	Text Classification Based On Machine Learning
10	Research On Text Emotion Classification Based On Improved Feature Selection Method