Study Of The Multi-Class Text Classification Based-On SVM

Posted on:2011-07-16

Degree:Master

Type:Thesis

Country:China

Candidate:J H Li

Full Text:PDF

GTID:2178330305460302

Subject:Computer application technology

Abstract/Summary:

Since 1990s, Internet has been in such a dramatic increase that it contains huge amount of raw information including text, sound, and image. Data mining should be applied to the text information in order to extract the useful pattern that is interested and potential and the hidden information from the substantive, heterogeneous and unstructured data sources. With the rapidly development of the text data, text mining have been an important study direction in data mining area.Automatic text classification is to sort documents to one or more categories automatically, it is a key technique in content-based automatic information management. Text vectors are high dimensional and extremely sparse, and have numbers of relevant features. SVMs are particularly suited for text categorization and have great potential in text categorization, as SVMs are not sensitive to relevant features and sparse data, and have advantages in dealing with high dimensional problems. However, there are still many ongoing research issues to SVMs in text categorization application, such as incremental learning, multi-label classification, and lower speed in training and classification etc. The SVM was originally developed to solve binary classification problems, how to effectively extend it for multi-class classification is still an on-going research issue.Among all kinds of methods, binary tree multi-class text categorization algorithm based on SVM is more effective than others in training and sorting, and it works out the impartibility problem, so it is a good method. Aiming at the shortcoming of binary tree SVM, new binary trees are established to improve the decision speed.and the accuracy of multi-classifier based on the effect of distribution of classes to inter-class separability, adopted a method of cluster analysis. At last, we cites a corpus published from processing open platform of Chinese natural language by Dr. Li(Li Ronglu)and makes an experiment on the system he created, and gives summary and further analysis on the result of the experiment, the efficiency of improved methods are proved by results of experiment.

Keywords/Search Tags:

text mining, text classification, SVM, multi-class classification algorithm, a method of cluster analysis

Related items

1	Study Of The Multi-class Text Classification Based-on Svm
2	Research On Class Semantics And Imbalanced Distribution Methods For Multi-Label Text Classification
3	Research On Text Classification Algorithm Based On Class-reorganization And Model Fusion
4	Research On Text Classification Of Web Text Mining
5	Study On Multi-class Text Classification Based On Support Vector Machines
6	Research Of Automatic Web Page Categorization And Cluster Based On Web Mining Technology
7	Research On Key Techniques Of Short-text Representation And Classification Based On Hybrid Semantic
8	Based On The Rapid Large-scale Text Hierarchical Classification Problem Of Centralized
9	Research On Text Classification Method Based On Support Vector Machine
10	Chinese Text Classification Method Based On Improved Topic Model