Research On The Application Of Text Classification Algorithm In The Management Of University Archives

Posted on:2015-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wang

Full Text:PDF

GTID:2298330467988609

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years, the rapid development of Chinaâ€™s higher education, In colleges and universities teaching, scientific research and administrative management of archives is increasing,Based file management needs work, Most colleges and universities are using self-built or purchase records management software to manage the above file. At present the mainstream in college archives management software are not to the function of the automatic classification of archives, To reduce the workload of university archives of business personnel, This article explores a kind of suitable for automatic text classification method of college archives.This paper introduces the development process of Chinese text classification algorithm and the development of the current text classification algorithm.This article Introduce the current research status of text representation, text feature selection, feature extraction, feature weighting, Classification algorithm, to build a classifier, assess the classifier performance.The author according to the characteristics of the universities management files, to improve the steps in the traditional Chinese text classification algorithm, according to the characteristics of college archives,such as periodic, repeatability, and informative, draw on the experience of working method of archivists,such as by file name and topic responsible information can determine the type of file.Puts forward the rely on categories thesaurus, the word stop words library and responsible library based word library to support files to name and responsible as the main analysis object of the essay the multifactor weighted classification algorithm based on semantic. First used in the algorithm with the method of statistical analysis and artificial experience for ten file categories of each class of small classes are constructed respectively words and weight, and from those who already classified archives information extracted information construct responsible thesaurus, and stop words library is constructed by artificial experiences.In this algorithm, firstly those information to determine a file belong to categories, and to the file name to stop operation and extract containing the category of the words and weight,then through the comparison of weighted summation, the determine the type of the fileâ€™s size.Experiments show that the archive information is to name and responsible for the completecase, this algorithm first classified the success rate reached93%, Not accurate classificationof archives is due to the title and statement of information is not complete. No accurateclassification of archives can turn artificial processing, t the same time by adjusting the stopwords library and categories thesaurus and weight coefficient can further improve theclassification of success.The algorithm is used to reduce the workload of university archivesprofessionals, good results have been achieved.

Keywords/Search Tags:

Text classification, Archives in Universities, Short text, Semantic

PDF Full Text Request

Related items

1	Research On Key Techniques Of Short-text Representation And Classification Based On Hybrid Semantic
2	Research On Classification Of Short Text Sequences With Multi-Views Based On Semantic Representation
3	Research On Text Semantic Enhancement And Short Text Classification Method Based On Topic Model
4	Research On Short Text Classification Based Upon Convolution Feature Encoding And Attention Mechanism
5	The Study Of Short Text Classiifcation Algorithm Based On Semantic
6	The Research And Implementation On Chinese Short Text Classification Technology
7	Research On Short Text Classification
8	Construction And Automatic Filtering Method Of Large Sclae Short Text Summary Data Set
9	Research On Fitlteration And Classfication Methods Of Large-Scale Short Text
10	Short Text Classification Based On Apriori Algorithm