Research On Classification Of File Text Data Based On SVM

Posted on:2019-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Lin

Full Text:PDF

GTID:2428330548464151

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of modern information technology and the emergence of a large number of electronic documents,the construction of digital archives has become a hot spot for the archives industry in China.In the era of more and more information and intelligence,the text data of traditional paper archives must be gradually converted into digital forms for preservation.The electronicization of traditional paper archives is a necessary step in the information age.It is of great practical significance to research the classification of files and the construction of digital files.This study is based on the support vector machine(SVM)statistics classification method.The research background of this paper is based on the construction and research of gansu archival management resource platform,and the research and design of text classification algorithm for archival text data.In the design process of text data file system,the main task is to classify text data file training and classification test two parts,the file text to the training process includes:(1)the text participle,positive maximum matching of text with the modified word segmentation algorithm for text segmentation;(2)feature selection,realizing the algorithm of document frequency and chi-square selection(3)weight calculation,realizing the weight algorithm of TF and TF*IDF;(4)classifier construction: a text classification algorithm based on SVM for archival documents is implemented.In the process of the design of classifier,by using the method of Cross Validation(Cross Validation)for parameter optimization of classifier,select a set of good parameters for the test of the test sample,in order to meet the practical requirement of accuracy of classification model.The purpose of this paper is to reduce the time cost of SVM in classification and improve the accuracy of classification algorithm.Through to the final result accuracy,recall and F1 value and time efficiency of comparison,select the performance is a good method in the data set,and then through the classifier design in the process of parameter tuning,in practical application to the archives in the text mining platform,has obtained the good application effect.

Keywords/Search Tags:

Digital Archives, File Text Classification, SVM, Cross Validation

PDF Full Text Request

Related items

1	Text Emotional Classification Based On Text Mining
2	Research On The Application Of Text Classification Algorithm In The Management Of University Archives
3	Studies On Digital Archives Of Communications Of GuangDong Province
4	Digital File Information Security
5	Digital Archives Management System
6	Statistical Inference Of Classification Learning Algorithm Based On Blocked3Ã—2Cross-Validation
7	Construction Of Archives And Cataloging System Of Digital Archives
8	Impact Analysis Of Classification Performance For Data Distribution In Cross-Validation
9	Research On The Personal Digital Archives
10	Study Of Spam-filtering Based On Text Classification