Font Size: a A A

Research On Classification Of File Text Data Based On SVM

Posted on:2019-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:2428330548464151Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of modern information technology and the emergence of a large number of electronic documents,the construction of digital archives has become a hot spot for the archives industry in China.In the era of more and more information and intelligence,the text data of traditional paper archives must be gradually converted into digital forms for preservation.The electronicization of traditional paper archives is a necessary step in the information age.It is of great practical significance to research the classification of files and the construction of digital files.This study is based on the support vector machine(SVM)statistics classification method.The research background of this paper is based on the construction and research of gansu archival management resource platform,and the research and design of text classification algorithm for archival text data.In the design process of text data file system,the main task is to classify text data file training and classification test two parts,the file text to the training process includes:(1)the text participle,positive maximum matching of text with the modified word segmentation algorithm for text segmentation;(2)feature selection,realizing the algorithm of document frequency and chi-square selection(3)weight calculation,realizing the weight algorithm of TF and TF*IDF;(4)classifier construction: a text classification algorithm based on SVM for archival documents is implemented.In the process of the design of classifier,by using the method of Cross Validation(Cross Validation)for parameter optimization of classifier,select a set of good parameters for the test of the test sample,in order to meet the practical requirement of accuracy of classification model.The purpose of this paper is to reduce the time cost of SVM in classification and improve the accuracy of classification algorithm.Through to the final result accuracy,recall and F1 value and time efficiency of comparison,select the performance is a good method in the data set,and then through the classifier design in the process of parameter tuning,in practical application to the archives in the text mining platform,has obtained the good application effect.
Keywords/Search Tags:Digital Archives, File Text Classification, SVM, Cross Validation
PDF Full Text Request
Related items