| As one of the key technologies of information processing, the text classification plays an important role in many aspects,such as information organization.information filtering,and has widespread application background and practical value.The rough set theory and method can handle complex data and information system and becomes a new mathematical tool of processing fuzzy and the uncertainty question. The paper studies the chinese text classification based on the rough set theory and the main work is as follows:A improved TF-IDF weighted method is proposed and used to the feature selection.The method and the CHI statistics carry on respectively feature selection,and then it takes the intersection of the two results as the feature vector.So it can effectively filter the weak representative feature items.It combines the feature selection method and proposes a suitable for text classification attribute reduction algorithm.First,the algorithm retaines the important attribution of the information system table,and the feature items are sorted on evaluation score by feature selection method and puted into the reduction attribute set in order until discernibility matrix is empty.Based on the above research,the paper implementes a spam filtering prototype system based on text classification.The system mainly includes text pretreatment module,the statistics-text representation module,feature selection module,discretization and attribute reduction module and rules matching module. It also proposes a multi-level matching rule method.The related work about the paper are vertified,and the result indicates that the attribute reduction algorithn and rules mathing algorithm in the atricle effectively improve the performance of the text classification. |