Font Size: a A A

Study Of The Multi-Class Text Classification Based-On SVM

Posted on:2011-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiFull Text:PDF
GTID:2178330305460302Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since 1990s, Internet has been in such a dramatic increase that it contains huge amount of raw information including text, sound, and image. Data mining should be applied to the text information in order to extract the useful pattern that is interested and potential and the hidden information from the substantive, heterogeneous and unstructured data sources. With the rapidly development of the text data, text mining have been an important study direction in data mining area.Automatic text classification is to sort documents to one or more categories automatically, it is a key technique in content-based automatic information management. Text vectors are high dimensional and extremely sparse, and have numbers of relevant features. SVMs are particularly suited for text categorization and have great potential in text categorization, as SVMs are not sensitive to relevant features and sparse data, and have advantages in dealing with high dimensional problems. However, there are still many ongoing research issues to SVMs in text categorization application, such as incremental learning, multi-label classification, and lower speed in training and classification etc. The SVM was originally developed to solve binary classification problems, how to effectively extend it for multi-class classification is still an on-going research issue.Among all kinds of methods, binary tree multi-class text categorization algorithm based on SVM is more effective than others in training and sorting, and it works out the impartibility problem, so it is a good method. Aiming at the shortcoming of binary tree SVM, new binary trees are established to improve the decision speed.and the accuracy of multi-classifier based on the effect of distribution of classes to inter-class separability, adopted a method of cluster analysis. At last, we cites a corpus published from processing open platform of Chinese natural language by Dr. Li(Li Ronglu)and makes an experiment on the system he created, and gives summary and further analysis on the result of the experiment, the efficiency of improved methods are proved by results of experiment.
Keywords/Search Tags:text mining, text classification, SVM, multi-class classification algorithm, a method of cluster analysis
PDF Full Text Request
Related items