The Research Of Chinese Text Categorization Based On Rough Set In Spam Filtering

Posted on:2012-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2178330335463673

Subject:Computer application technology

Abstract/Summary:

As one of the key technologies of information processing, the text classification plays an important role in many aspects,such as information organization.information filtering,and has widespread application background and practical value.The rough set theory and method can handle complex data and information system and becomes a new mathematical tool of processing fuzzy and the uncertainty question. The paper studies the chinese text classification based on the rough set theory and the main work is as follows:A improved TF-IDF weighted method is proposed and used to the feature selection.The method and the CHI statistics carry on respectively feature selection,and then it takes the intersection of the two results as the feature vector.So it can effectively filter the weak representative feature items.It combines the feature selection method and proposes a suitable for text classification attribute reduction algorithm.First,the algorithm retaines the important attribution of the information system table,and the feature items are sorted on evaluation score by feature selection method and puted into the reduction attribute set in order until discernibility matrix is empty.Based on the above research,the paper implementes a spam filtering prototype system based on text classification.The system mainly includes text pretreatment module,the statistics-text representation module,feature selection module,discretization and attribute reduction module and rules matching module. It also proposes a multi-level matching rule method.The related work about the paper are vertified,and the result indicates that the attribute reduction algorithn and rules mathing algorithm in the atricle effectively improve the performance of the text classification.

Keywords/Search Tags:

Chinese text classification, rough set, feature selection, attribute reduction, level rule match

Related items

1	Research On Text Emotion Classification Based On Rough Set
2	The Research On Text Classification Technology Based On The Rough Set Theory
3	Research On Optimization Of Text Classification Based On Improved Rough Set Model
4	Research On Integrated Classification Algorithm Based On Rough Set Attribute Reduction
5	Studies On Feature Selection Method Based On Heuristic Attribute Reduction Of Rough Set
6	Study On Chinese Text Classification Algorithm Based On Rough Set And It's Application
7	Research And Application Of Text Feature Reduction And Classification Rule Extraction
8	Equivalent Cluster Of Fuzzy Rough Set Based Indeterminate Attribute Reduction And Its Application On The Fashion's Match
9	Research On Text Classification Based On Rough Set
10	An Attribute Reduction Algorithm Based On Dynamic Neighborhood Rough Set For Text Classification