Font Size: a A A

Research On The Application Of Text Classification Algorithm In The Management Of University Archives

Posted on:2015-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2298330467988609Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of China’s higher education, In colleges and universities teaching, scientific research and administrative management of archives is increasing,Based file management needs work, Most colleges and universities are using self-built or purchase records management software to manage the above file. At present the mainstream in college archives management software are not to the function of the automatic classification of archives, To reduce the workload of university archives of business personnel, This article explores a kind of suitable for automatic text classification method of college archives.This paper introduces the development process of Chinese text classification algorithm and the development of the current text classification algorithm.This article Introduce the current research status of text representation, text feature selection, feature extraction, feature weighting, Classification algorithm, to build a classifier, assess the classifier performance.The author according to the characteristics of the universities management files, to improve the steps in the traditional Chinese text classification algorithm, according to the characteristics of college archives,such as periodic, repeatability, and informative, draw on the experience of working method of archivists,such as by file name and topic responsible information can determine the type of file.Puts forward the rely on categories thesaurus, the word stop words library and responsible library based word library to support files to name and responsible as the main analysis object of the essay the multifactor weighted classification algorithm based on semantic. First used in the algorithm with the method of statistical analysis and artificial experience for ten file categories of each class of small classes are constructed respectively words and weight, and from those who already classified archives information extracted information construct responsible thesaurus, and stop words library is constructed by artificial experiences.In this algorithm, firstly those information to determine a file belong to categories, and to the file name to stop operation and extract containing the category of the words and weight,then through the comparison of weighted summation, the determine the type of the file’s size.Experiments show that the archive information is to name and responsible for the completecase, this algorithm first classified the success rate reached93%, Not accurate classificationof archives is due to the title and statement of information is not complete. No accurateclassification of archives can turn artificial processing, t the same time by adjusting the stopwords library and categories thesaurus and weight coefficient can further improve theclassification of success.The algorithm is used to reduce the workload of university archivesprofessionals, good results have been achieved.
Keywords/Search Tags:Text classification, Archives in Universities, Short text, Semantic
PDF Full Text Request
Related items