Font Size: a A A

Research Of Text Categorization Base On Vector Space Model And Association Rules

Posted on:2006-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YeFull Text:PDF
GTID:2178360212482470Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Automatic text categorization techniques involve the areas of information retrieval ,pattern recognition and machine learning. The quality of web search engine are improved when people apply this technology to information . It put document into some classes categorized by people. The traditional text categorization involves two phases: the first, feature extraction from text and select them by some feature extraction function, the maps each text to a point in vector space model. Next, applying specific text categorization algorithms to group them.Nevertheless, due to some inherent defects of the vector space model, which can't differentiate relationship of terms in document. This paper propose a kind of two-phrase feature selection algorithms to make up for the feature extraction. First, it calculate the fall of DFs in every class to delete"ordinary word"and"occasional word". Second, based on the result of first filter, applying algorithms of classification based on associations to mining relationship of feature items and class. Then merge the condition set of rules which have the same class label. The two-phrase feature selection algorithms proved by experiment effectively improve the quality of text categorization...
Keywords/Search Tags:vector space model, association rule, text categorization, feature selection, data mining
PDF Full Text Request
Related items