Font Size: a A A

Research On Chinese Text Classification Based On SA-SVM

Posted on:2020-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:C L GuoFull Text:PDF
GTID:2438330572999547Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the data and resources of the Internet are gradually presented as magnitude quantification,but the information of magnitude quantization is disorderly,which often makes people unable to start to use.In order to effectively manage and utilize these huge information,information intelligent retrieval,information filtering and data mining emerge as the times require.Among them,text categorization is the most important support.It uses computer technology to automatically divide the text with the same characteristics into pre-set text categorization system according to its content.Text categorization can bring convenience to information management and utilization,meanwhile,has broad application prospects.Classification algorithm is the core of text categorization.Many scholars have provided us with many excellent classification algorithms in the process of studying Chinese text categorization.The traditional machine learning classification algorithms include Bayes algorithm,KNN algorithm,logical regression algorithm,decision tree algorithm and support vector machine(SVM)algorithm.A large number of experimental studies have shown that SVM has strong learning ability and generalization ability in Chinese text categorization.Through the analysis of the principle of SVM algorithm and experimental examples,it can be concluded that the performance of text classification based on SVM is closely related to its penalty factor and kernel function parameter.The parameter selection of penalty factor and kernel function parameter directly affects the accuracy of text classification.Aiming at the shortcomings of traditional optimization methods of SVM parameters,it is found that simulated annealing algorithm has strong global searching ability in three-dimensional space through theoretical analysis and experimental verification.This paper proposes a method to optimize SVM parameters by SA,and compares the performance of several groups of standard UCI datasets with several optimization algorithms.It is proved that the SA-SVM model can jump out of the local optimum and find the global optimum parameters when searching for the optimal parameters of SVM by using the probabilistic jump characteristics of its random disturbance,which makes the model have good classification performance.In order to reflect the practical application value of SA-SVM classification model,the SA-SVM classification model proposed in this paper is applied to the practice of Chinese text classification,moreover,the Chinese text corpus of Fudan University and the Chinese text corpus of Sogou are used as experimental data sets to verify the classification performance of SA-SVM classification model in Chinese text classification by comparing with several commonly used classification algorithms.The experimental results show that the SA-SVM classification model proposed in this paper has strong generalization ability compared with other Chinese text classification algorithms,which achieves good classification results and shows a more remarkable classification performance.
Keywords/Search Tags:simulated annealing algorithm, SVM, parameter optimization, Chinese text categorization
PDF Full Text Request
Related items