Font Size: a A A

Research On Text Classification Algorithm Based On SVM

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2518306560955539Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The information in the Internet era is highly developed,the massive text information representation,storage,transmission and utilization has been very common,but the value of information makes people get into trouble,resulting in "information developed,the lack of knowledge".How to dig out the potential practical value and the value of information from the massive data is one of the hot issues to be solved.In their daily work and life,we come into contact with the vast majority of data are existing in text format,people have been eager to find an efficient tool,according to the different characteristics of the main body of text information,to organize and manage these massive text data.Therefore,it is necessary to study the classification of text information.There are many kinds of algorithms of text classification,but there exist some problems in these algorithms is the text feature too much will have a "Curse of dimensionality" and the text of the training and classification time,support vector machine(SVM)is a statistical theory,and have characteristics of relevance and dilute characteristics,is not sensitive to the sparse.Has the advantage of strong generalization ability,especially has obvious advantages in dealing with high dimensional data,so SVM is suitable for text classification,it has very important research value in text classification.The main research work of this dissertation: Firstly,it introduces the development of text classification research and related technologies,compares the advantages and disadvantages of classification algorithms combined with text classification performance evaluation index,and analyzes the space for further improvement of traditional binary tree classification algorithm in terms of calculation amount and accuracy.Then,this dissertation proposes an improved binary tree SVM classification algorithm.The basic idea of the algorithm is that the sub nodes closer to the root node are segmented first,and then,according to the top-down and easy before difficult process,the weighted binary tree is constructed more quickly,so as to improve the time efficiency and accuracy of classification.Based on the improved two binary tree SVM classification algorithm,an improved text classifier model is designed by calling the Libsvm package in MATLAB.The Reuters-21578 data set is selected as the test data,and the data is normalized;Based on the cross validation algorithm,genetic algorithm is used to further optimize the penalty parameter C and kernel function parameter G.The accuracy,training and classification time,recall and F1 value of the improved text classifier are tested and compared with three traditional SVM text classification methods.The test results show that the proposed text classification method is superior to the traditional text classification method in terms of accuracy,training and classification time,recall rate and F1 value.
Keywords/Search Tags:Text classification, Support vector machine, Binary tree, Kernel function
PDF Full Text Request
Related items