Font Size: a A A

Research On Automatic Classification And Prediction Of Archives Based On Decision Tree

Posted on:2021-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2518306461461934Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the archives formed by various agencies,enterprises and institutions have exploded.Faced with the information resources and intellectual wealth contained in these archives,how to manage and utilize the archives management department scientifically and efficiently has always been a constant research direction in the industry.File classification is an indispensable part of file organization,and it is also the first step to realize archive data mining research.The traditional file classification is usually a manual process,which requires a lot of manpower and material resources,and has poor real-time performance and low efficiency.A good classification model can not only improve the efficiency of classification,but also replace manual processing.This dissertation mainly studies archival data preprocessing,feature processing,decision tree classification algorithm,decision tree pruning and comparison analysis with random forest classification algorithms.Firstly,in the preprocessing stage,an improved stop word dictionary generation algorithm is proposed for different sample set data and comparative analysis is carried out to effectively promote the dimensionality reduction of the file features and the correctness of the classification.Secondly,the ID3,CART and random forest algorithms are used to construct the file.The classification decision tree model is used to perform the secondary pruning experiment on the CART algorithm.Finally,the preprocessing and feature processing in the classification process are improved based on the archival attributes,and the classification accuracy and classification rules are obtained.And a large number of experimental comparison analysis,to obtain an efficient and easy to understand file classification model.This dissertation also implements the file automatic classification prediction prototype system based on Python language,which is applied to the classification of a certain document file,and has achieved good results.
Keywords/Search Tags:File classification, decision tree algorithm, stop word dictionary, classification prototype
PDF Full Text Request
Related items