Font Size: a A A

Research On Digital Library And Automatic Document Classification

Posted on:2005-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YangFull Text:PDF
GTID:2168360125959390Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of the computer and Internet, the types of online information are more and more abundant and usable resources are more and more abundant. So a kind of systematic technology is needed urgently to manage the digital information resources. To meet with this kind of need the digital library came into being. The digital library is a new developing computer application involving a great deal of technology, such as Internet, multimedia, data warehouse, data mining, copyright protection, etc. And its application and commercial future is very promising.This paper is about the technology of the digital library which studies the digital library's concept, characteristic, technology system structure and the metadata including XML and Dublin which are needed to construct the digital library. On this basis, the author proposes a metadata form which is imbeded among resources based on XML. Now the systems such as Wangfan and Qinghua CNKI adopt the specific resource forms. The resource form of Wangfan is pdf and that of CNKI is caj. The resource form developed by the author is the one that can be showed and read within the browser directly, and it is not necessary for the users to read them by the specific reader.Document classification is a very important component in the digital library. There are many classification algorithms both at home and abroad. The article talks about in details SVM, a kind of classification technology and its excellent behavior in the area of document classification. And in the meantime, the article also points out its shortcoming, Classification of SVM is on the basis of extensive corpus, and if the file is relatively short, then its advantage cannot be embodied. So the author studies the association rules algorithm and puts forward the term-item algorithm which does well in the classification of short documents.
Keywords/Search Tags:digital library, metadata, auto classification,SVM,term-item algorithm
PDF Full Text Request
Related items