Font Size: a A A

Research On Chinese Text Classification Based On Support Vector Machine

Posted on:2018-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:M Y YangFull Text:PDF
GTID:2348330515957500Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the information is in the form of massive growth.How to obtain useful information from a large number of information is an urgent problem need to be solved.The information is mainly in the form of text,and the Chinese is the most widely used language in the world,so researching on Chinese text classification is of great significance.Text classification can efficiently organize and manage information,position information fast and accurately.And it effectively solves the unordered information problems.The problem of text classification is high dimensionality,sparseness and high degree of feature association.The support vector machine(SVM)has great advantages in solving these problems,therefore,the SVM is widely used in text classification.However,there are some disadvantages of SVM,for example,when the number of samples increases the speed of the classification becomes slowly,and the parameters have great influence on the learning performance and generalization ability.The problem of traditional SVM parameters optimization methods is that,search ability is weak and the problem of accuracy is not high.In this paper,aiming at the above problems,a detailed study was made on the optimization parameters of SVM to improve the accuracy of text classification and the classification speed.The main research contents of this paper are as follows:First of all,the paper systematically summarized the research background and significance of text classification,the current situation at home and abroad,the future development prospects;introduced the related theory and key technology of text classification,compared with the commonly used algorithms in text categorization.Through experiments,SVM was proved to be a relatively effective algorithm.Secondly,aiming at the difficult problem of parameter selection of support vector machine,the firefly algorithm was introduced.And an improved firefly algorithm was proposed to optimize the SVM parameters.Through experiments,the results showed that the global search ability of improved firefly algorithm was enhanced in the early,the convergence speed became fast in the latter,the performance of the algorithm was improved.Thirdly,the improved firefly algorithm was applied to SVM parameter optimization,and the optimized parameters were applied to training SVM model.Finally,via the experiment,compared the result of text classification between standard SVM and the improved SVM.Experimental results showed that the improved SVM model can accelerate the classification speed and improve the classification accuracy,and enhanced the classification performance of SVM.Consequently it verified the effectiveness of the improved algorithm.
Keywords/Search Tags:text classification, SVM, parameter optimization, Firefly algorithm
PDF Full Text Request
Related items