Text Classification Technology And Applied Research

Posted on:2008-12-31

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Wang

Full Text:PDF

GTID:2208360215965158

Subject:Computer software and theory

Abstract/Summary:

With the rapid development of communication and Internet, various information increases exponentially. Text, the most typical information carrier, can not make an exception. In order to control and retrieve valuable information, research of automatic text categorization(TC) becomes very important.Text categorization is the assignment of predefined categories to documents based on their content.It is a core of text mining. The paper describe the basic theory of text categorization, discussed relevance technology of text categorization, constructe the vector model of text representation base on vector space model, and study the now available feature selection and algorithm. The main researches are focused as follows:(1)The whole process of text representation were discussedâ€”word segmentation, building stop words list, feature selection, weight computation and generationg vector space.(2)Four methods of text categorizationâ€”Naive Bayes, KNN, SVM and Decision tree were introduced and compared.(3)Tree main parts of text words segmentation techniques, feature selection and extraction algorithms and categorization algorithms were analysed and researched, on the basis of the researches, give the improved algorithms. and discuss categorizing ability of the system by some experiments. The results of the experiments prove that the improved algorithms are effective and categorizing ability of the system is satisfied.(4)The researches on text categorization in future were prospected.

Keywords/Search Tags:

Text Categorization, Vector Space Model, Feature Extraction, Categorization Algorithm

Related items

1	Modeling And Implementation Of Chinese Text Categorization System Based On SVM
2	Design And Realization Of Text Categorization System
3	The Research And Implementation Of Chinese Text Categorization
4	Research Of Text Categorization Based On Vector Space Model
5	Research On Chinese Text Categorization Algorithms Based On Technology Text
6	The Research And Implementation Of Chinese Text Categorization System
7	Research Of Text Categorization Base On Vector Space Model And Association Rules
8	Study For Text Categorization Based On Feature Weighting
9	A Study On Text Categorization Based On Machine Learning
10	An examination of KSS for feature selection for text categorization using support vector machines