Research On Chinese Text Classification Based On SA-SVM

Posted on:2020-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:C L Guo

Full Text:PDF

GTID:2438330572999547

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,the data and resources of the Internet are gradually presented as magnitude quantification,but the information of magnitude quantization is disorderly,which often makes people unable to start to use.In order to effectively manage and utilize these huge information,information intelligent retrieval,information filtering and data mining emerge as the times require.Among them,text categorization is the most important support.It uses computer technology to automatically divide the text with the same characteristics into pre-set text categorization system according to its content.Text categorization can bring convenience to information management and utilization,meanwhile,has broad application prospects.Classification algorithm is the core of text categorization.Many scholars have provided us with many excellent classification algorithms in the process of studying Chinese text categorization.The traditional machine learning classification algorithms include Bayes algorithm,KNN algorithm,logical regression algorithm,decision tree algorithm and support vector machine(SVM)algorithm.A large number of experimental studies have shown that SVM has strong learning ability and generalization ability in Chinese text categorization.Through the analysis of the principle of SVM algorithm and experimental examples,it can be concluded that the performance of text classification based on SVM is closely related to its penalty factor and kernel function parameter.The parameter selection of penalty factor and kernel function parameter directly affects the accuracy of text classification.Aiming at the shortcomings of traditional optimization methods of SVM parameters,it is found that simulated annealing algorithm has strong global searching ability in three-dimensional space through theoretical analysis and experimental verification.This paper proposes a method to optimize SVM parameters by SA,and compares the performance of several groups of standard UCI datasets with several optimization algorithms.It is proved that the SA-SVM model can jump out of the local optimum and find the global optimum parameters when searching for the optimal parameters of SVM by using the probabilistic jump characteristics of its random disturbance,which makes the model have good classification performance.In order to reflect the practical application value of SA-SVM classification model,the SA-SVM classification model proposed in this paper is applied to the practice of Chinese text classification,moreover,the Chinese text corpus of Fudan University and the Chinese text corpus of Sogou are used as experimental data sets to verify the classification performance of SA-SVM classification model in Chinese text classification by comparing with several commonly used classification algorithms.The experimental results show that the SA-SVM classification model proposed in this paper has strong generalization ability compared with other Chinese text classification algorithms,which achieves good classification results and shows a more remarkable classification performance.

Keywords/Search Tags:

simulated annealing algorithm, SVM, parameter optimization, Chinese text categorization

PDF Full Text Request

Related items

1	Research Of Text Categorization Based On The Theme Mining And Covering Algorithm
2	The Studies On Chinese Text Categorization Based On Pso And Svm
3	Research And Implementation Of The Automatic Chinese Text Categorization
4	Research And Implementation On Web Chinese Text Categorization Technology
5	The Analysis And Application Research Of Optimizing Simulated Annealing Algorithm
6	Application Of A Modified PSO Algorithm Combining With GA Operators In Control System Design
7	Research Of Chinese Web Text Categorization Based On KNN Algorithm
8	Research Of Function Optimization Of PSO Based On Simulated Annealing
9	Research Of Chinese Text Categorization Algorithms Based On Information Entropy
10	Research And Application Of Co-evolution Algorithm