Font Size: a A A

Research On Text Classification Based On Support Vector Machine

Posted on:2020-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:X H HeFull Text:PDF
GTID:2438330620455592Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the large-scale use of the Internet and mobile devices,data has increased exponentially,and the processing of massive data is an urgent and important issue for people.Text classification is an important research direction in the field of natural language processing.It can efficiently manage data information,realize rapid data location and classification,and further process data to effectively alleviate the confusion of data information.In China,information is mostly presented in Chinese text,so Chinese text classification has important research significance.This paper first introduces the theoretical basic knowledge of Chinese text classification based on machine learning,and focuses on the feature processing methods and classifiers that are widely used at present.The principle and process of Support Vector Machine(SVM)algorithm are deeply studied,and the selection of basic methods and kernel functions is discussed.Aiming at the problem that the penalty factor of the support vector machine and the radial basis kernel parameters are difficult to select,the particle swarm optimization algorithm is introduced to optimize.Aiming at the defects of particle swarm optimization,the inertia weight is changed to nonlinear decreasing inertia weight,the learning factor of asynchronous change is introduced,and the method of reprocessing the transboundary particle is improved.The machine learning UCI data set is used for comparison verification test.The experimental results show that the improved particle swarm optimization algorithm has higher classification accuracy,and validates the effectiveness of the improved particle swarm optimization algorithm for optimizing SVM parameters.Then,the influence of the exponent n on the improved algorithm in the nonlinear decreasing inertia weight equation is analyzed and discussed.The SVM model parameters are optimized by an improved particle swarm optimization algorithm with different n values,and this parameter is used to train the model.Through the SVM text classification experiment results,the best index value in this environment is found to find the optimal state of the improved particle swarm optimization algorithm in this environment.In this paper,using the Chinese dataset provided by Fudan University,the improved particle swarm optimization algorithm,the linearly decreasing inertia weight particle swarm optimization algorithm and the standard particle swarm optimization algorithm are used to optimize the parameters of the SVM model.The optimized parameters are used to train the SVM model.Through the textclassification comparison experiment,the trained classifier is tested and evaluated by the test set.The experimental results show that compared with the linear descending inertia weight particle swarm optimization algorithm and the standard particle swarm optimization algorithm SVM model,the improved particle swarm optimization algorithm SVM model has a significant improvement in the accuracy,recall rate and F1 value of text classification.It proves that the SVM classifier with improved particle swarm optimization algorithm has better classification performance and the effectiveness of improved particle swarm optimization algorithm for parameter optimization of support vector machine model.
Keywords/Search Tags:Text Categorization, Support Vector Machine, Parameter Optimization, PSO Algorithm
PDF Full Text Request
Related items