Research And Implement On The Related Algorithms Of Chinese Text Classification

Posted on:2008-12-18

Degree:Master

Type:Thesis

Country:China

Candidate:R P Yu

Full Text:PDF

GTID:2178360215964588

Subject:Computer application technology

Abstract/Summary:

With the rapid development of information technology, especially the popularization of Internet Application, the electronic text information greatly increases. It is a great challenge for information science and technology to organize and process so large amount of data, and find out the interesting information for the users quickly and exactly. One way of managing the texts efficiently is Text Automatic Classification. Text Automatic Classification is an important intelligent information processing method, which is of great application value in such fields as information filtering, information retrieval, text database, digital library and so on.This paper discusses the applications of Text Classification in the domains of nature language, text mining, machine learning and pattern discrimination. The Text Classification technology and related algorithms are introduced. A Chinese Text Automatic Classification System is designed and implemented for finding out the problems and rules in all algorithms of Text Classification. The system has training module and classifying module. Training module includes: (1) Chinese text preprocessing. Chinese word segmentation based on FMM algorithm is implemented. And a useful stop-word dictionary is made by experiment. (2) Terms selection. Five algorithms including Information Gain, Mutual Information (MI), x~2 Statistic, Cross Entropy (CE), Document Frequency (DF) are implemented. (3) Weight computing. Various weight algorithms including Term Frequency (TF), TF*IDF, TF*term's evaluating value, TF*IDF*term's evaluating value etc are implemented. (4) Classification model constructing. Three classification algorithms based on statistic method including Class-center Classification, Bayes, KNN are implemented. Unlabeled text is classified by classification model in classifying module. The result is evaluated and fed back to the training module for improving the process of training by the part of evaluation.The experimental value of parameters related and the better combination of these algorithms etc are obtained by experiments. The experiment data can be used in information retrieval, information filtering, digital library, Web-page classification and so on.

Keywords/Search Tags:

Chinese Text Classification, Word Segmentation, Terms Selection, Weight Computation

Related items

1	Research And Implementation Of Chinese Text Categorization
2	Researches On Hierarchical Chinese Text Classification
3	Research On Word Segmentation And Feature Selection Of Chinese Text Chinese Text Classification
4	Research On Core Technology Of The Chinese Text Classification
5	Research And Implementation Of Chinese Automatic Text Classification System Based On SVM
6	Research On Network Text Classification Technique
7	Research And Application Of Internet Chinese Text Classification
8	Research On Chinese Text Categorization Algorithms Based On Technology Text
9	The Research And Implementation Of Chinese Text Classification System
10	Research On Chinese Word Segmentation Based On Text And Audio