Font Size: a A A

Research On Text Classification Method Based On Support Vector Machine

Posted on:2016-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiFull Text:PDF
GTID:2348330542975819Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The modern time is an era of information explosion,the Internet is developing very rapidly and information on the network are increasing every day.In order to quickly find what they need from the massive resources,we need to classify information resulting in classification technology.Application background of this article is the text information processing,we have a in-depth research about text classification from both theoretical and applied aspects.This paper first describes the related knowledge about text classification,then highlight describes SVM construction method of binary classification and multi-category classification,analyze their strengths and weaknesses,and then put forward the improved binary classification SVM and the improved multi-category classification SVM,and through experiments prove that it works well in text categorization.The main work and innovation as follows:(1)Introduction about text classification relevant technologies,the main contents include: text preprocessing,feature selection and text representation.In this paper,the weight of eigenvalue are calculated by TF?IDF formula,the method of text representation is the vector space model.After introduce the basic knowledge,it also introduces several common text classification methods: Na?ve Bayes?clustering center vector?k-nearest neighbor and support vector machine,and analyzed the advantages and disadvantages of these methods at the time of classification.(2)Introduce the theoretical knowledge of support vector machine,analyse and explain the advantages of SVM in machine learning.Then introduces the technical knowledge when SVM is applied in classification,such as:using vector space mapping technique to solve nonlinear problems,taking the kernel technology to solve the dimension disaster.Finally,introduce the several common multi-category classification SVM,such as: one-against-one SVM ? one-against-rest SVM,and the analysis and comparison of the advantages and disadvantages of these methods.(3)The core of this paper is that it proposes an improved binary classification SVM.The improved algorithm proposed a two-stage informative pattern extraction approach,andit's purpose is to extract a set which includes all support vector before building the classifier,with this set to train the support vector machine can both reduce training time and can hold the classification accuracy.The first stage of our approach is data cleaning based on bootstrap sampling,informative pattern extraction based on maximum information entropy are proposed in the second stage.Then in four large number set through experiments to validate the effectiveness of the method in reducing training samples and computational cost.(4)Another core of this paper is that it proposes an improved multi-category classification SVM.With cluster centers of positive class as reference points instances are selected for each one-against-rest SVM model,the purpose of which is to extract a set which includes all support vector to train SVM reducing train time.The purpose of clustering here is to improve the efficiency of instance selection,other than to select instances directly from clusters as previous methods did.Experiments on a wide variety of datasets demonstrate that the proposed method selects fewer instances than the competitive algorithms and keeps the higher classification accuracy on most datasets.
Keywords/Search Tags:Text classification, Support vector machine, instance selection, binary classification, Multi-class classification
PDF Full Text Request
Related items