Text Classification Techniques In A Digital Library Applications And Research

Posted on:2008-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:C L Zhang

Full Text:PDF

GTID:2208360212488235

Subject:Computer application technology

Abstract/Summary:

The Classification as a major human cognitive means of understanding the world, had been in existence for a long time.Text classification has become a focus of computer and related fields with the advent of widely used computer,especially after the emergence of the Internet,the rapid growth of online text files. Text classification has become more widespread attention and key technologies of research.Text classification is an important research area in digital Library. At present, the global digital library is becoming a hot area of information infrastructure. is the focus of global culture and technology competition in the 21st century. Metadata organization and construction is the foundation for building a digital library. Metadata is structral data about data, which provide an accurate description of the data content, Semantics and service mechanism for digital library.In this paper, text classification and the application of digital library research, mainly on the following four issues : Pre-extracting support vector,text assessment, automatic metadata extraction,ontology metadata.1.This paper defines a method called face to face margin vector method of two convex hull, which is able to extract support vectors from givern training examples, then take them as training examples for support vector algorithm, the computation complexity of solving quadratic programming will reduce greatly. This method improves the speed of training support vector machine greatly.2.A method of evaluating text features that can detect important features and noise features, then assess the quality of testing samples and traning samples according to classification result, optimize and expand text database, improve steply the quality of training samples, expand the range of text database, verify the samples weight in the classification model according to sample's quality, improve performance of the classifier and adaptability to changing world.3.Metadata extraction strategies and extraction rules.There are two main routes in the study area of information extraction:rules-based model and statistics-based model.The main idea to rules-based model use text documents of the characteristics,structure and so on, to find some rules for extraction. The basic idea to statistics-based model is to find a suitalbel model, by changing the model parameters and training examples to achieve the adaptation for applicaion field.4. This paper proposes Digital Library metadata ontology. Metadata provides the basis semantic information for the Digital Library and makes microscopic structure for resources, But metadata is not completely solve the problem of heterogeneous information systems semantics,and the ontology can deal with these thing very well and provide models and methods for information organization and management, retrueval and query.

Keywords/Search Tags:

Text Classification, SVM, Digital Library, Metadata, Ontology

Related items

1	The Research Of Structured Management About Networklibrary Resources
2	Research On Digital Library And Automatic Document Classification
3	Application And Implementation Of Digital Library System Of CPC Tianjin Municipal Committee Party School
4	A Study On The Application Of Metadata In Digital Library
5	Enhancing a domain-specific digital library with metadata based on hierarchical controlled vocabularies
6	Exploration And Application Of Metadata Technology In Building Of Digital Library
7	Application Of Metadata Construction On Digital Library Business
8	Statistics-based Text Classification
9	Design And Implementation Of University Digtil Library System Based On Hadoop
10	The Research Of Web Text Classifier In The Digital Library