Word Frequency Extraction And Automatic Text Classification Methods In The Digital Library

Posted on:2003-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:M R Ren

Full Text:PDF

GTID:2208360065964011

Subject:Computer software and theory

Abstract/Summary:

Digital Library is a new computer application field that involves many technologies such as network, multimedia, data warehouse, data mining and copyright protection and so on, and research on it is at the beginning.A parallel digital library system based on parallel computing environment has been developed by our group. It has not only existing digital libraries' general functions but also query function based on structure and content which isn't realized in all other digital library systems. In addition, our system can establish adaptive digital libraries for our users with special needs.This paper designs and realizes the word frequency extract and automatic text categorization subsystem. Automatic text categorization subsystem can takes advantage of predefined class pattern's hierarchical structure to construct hierarchical classifier, overcoming the shortcomings of other text categorization systems that consider classes flattening. In word frequency extract subsystem, the paper designs an efficient hash algorithm according to English words and Chinese words' traits. The algorithm improves performance of the word frequency extract and statistics effectively. In addition, a text classification system based on Vector Space Model is studied and a new method for calculating word weight is proposed.

Keywords/Search Tags:

Automatic Text Categorization, Word Frequency Extract, Bayesian Theory, Vector Space Model

Related items

1	Chinese Text Data Classification
2	Research On Chinese Text Categorization Algorithms Based On Technology Text
3	The Research Of An Automatic Recommended Model Of Reviewer For A Submission System
4	Design And Realization Of Text Categorization System
5	Study On Text Category Oriented Chinese Text Mining And Its Implementation
6	Research And Improvement Of Automatic Text Classification Algorithm Based On The Vector Space Model
7	Research Of Text Categorization Based On Vector Space Model
8	Application Of Rough Set Theory In Chinese Text Categorization
9	The Research And Implementation Of Chinese Text Categorization
10	Research Of Text Categorization Base On Vector Space Model And Association Rules