Font Size: a A A

Text Classification Techniques In A Digital Library Applications And Research

Posted on:2008-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhangFull Text:PDF
GTID:2208360212488235Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Classification as a major human cognitive means of understanding the world, had been in existence for a long time.Text classification has become a focus of computer and related fields with the advent of widely used computer,especially after the emergence of the Internet,the rapid growth of online text files. Text classification has become more widespread attention and key technologies of research.Text classification is an important research area in digital Library. At present, the global digital library is becoming a hot area of information infrastructure. is the focus of global culture and technology competition in the 21st century. Metadata organization and construction is the foundation for building a digital library. Metadata is structral data about data, which provide an accurate description of the data content, Semantics and service mechanism for digital library.In this paper, text classification and the application of digital library research, mainly on the following four issues : Pre-extracting support vector,text assessment, automatic metadata extraction,ontology metadata.1.This paper defines a method called face to face margin vector method of two convex hull, which is able to extract support vectors from givern training examples, then take them as training examples for support vector algorithm, the computation complexity of solving quadratic programming will reduce greatly. This method improves the speed of training support vector machine greatly.2.A method of evaluating text features that can detect important features and noise features, then assess the quality of testing samples and traning samples according to classification result, optimize and expand text database, improve steply the quality of training samples, expand the range of text database, verify the samples weight in the classification model according to sample's quality, improve performance of the classifier and adaptability to changing world.3.Metadata extraction strategies and extraction rules.There are two main routes in the study area of information extraction:rules-based model and statistics-based model.The main idea to rules-based model use text documents of the characteristics,structure and so on, to find some rules for extraction. The basic idea to statistics-based model is to find a suitalbel model, by changing the model parameters and training examples to achieve the adaptation for applicaion field.4. This paper proposes Digital Library metadata ontology. Metadata provides the basis semantic information for the Digital Library and makes microscopic structure for resources, But metadata is not completely solve the problem of heterogeneous information systems semantics,and the ontology can deal with these thing very well and provide models and methods for information organization and management, retrueval and query.
Keywords/Search Tags:Text Classification, SVM, Digital Library, Metadata, Ontology
PDF Full Text Request
Related items