Font Size: a A A

Text Classification Technology And Applied Research

Posted on:2008-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2208360215965158Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of communication and Internet, various information increases exponentially. Text, the most typical information carrier, can not make an exception. In order to control and retrieve valuable information, research of automatic text categorization(TC) becomes very important.Text categorization is the assignment of predefined categories to documents based on their content.It is a core of text mining. The paper describe the basic theory of text categorization, discussed relevance technology of text categorization, constructe the vector model of text representation base on vector space model, and study the now available feature selection and algorithm. The main researches are focused as follows:(1)The whole process of text representation were discussed—word segmentation, building stop words list, feature selection, weight computation and generationg vector space.(2)Four methods of text categorization—Naive Bayes, KNN, SVM and Decision tree were introduced and compared.(3)Tree main parts of text words segmentation techniques, feature selection and extraction algorithms and categorization algorithms were analysed and researched, on the basis of the researches, give the improved algorithms. and discuss categorizing ability of the system by some experiments. The results of the experiments prove that the improved algorithms are effective and categorizing ability of the system is satisfied.(4)The researches on text categorization in future were prospected.
Keywords/Search Tags:Text Categorization, Vector Space Model, Feature Extraction, Categorization Algorithm
PDF Full Text Request
Related items