Font Size: a A A

The Research Of Chinese Text Categorization Based On Rough Set In Spam Filtering

Posted on:2012-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178330335463673Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the key technologies of information processing, the text classification plays an important role in many aspects,such as information organization.information filtering,and has widespread application background and practical value.The rough set theory and method can handle complex data and information system and becomes a new mathematical tool of processing fuzzy and the uncertainty question. The paper studies the chinese text classification based on the rough set theory and the main work is as follows:A improved TF-IDF weighted method is proposed and used to the feature selection.The method and the CHI statistics carry on respectively feature selection,and then it takes the intersection of the two results as the feature vector.So it can effectively filter the weak representative feature items.It combines the feature selection method and proposes a suitable for text classification attribute reduction algorithm.First,the algorithm retaines the important attribution of the information system table,and the feature items are sorted on evaluation score by feature selection method and puted into the reduction attribute set in order until discernibility matrix is empty.Based on the above research,the paper implementes a spam filtering prototype system based on text classification.The system mainly includes text pretreatment module,the statistics-text representation module,feature selection module,discretization and attribute reduction module and rules matching module. It also proposes a multi-level matching rule method.The related work about the paper are vertified,and the result indicates that the attribute reduction algorithn and rules mathing algorithm in the atricle effectively improve the performance of the text classification.
Keywords/Search Tags:Chinese text classification, rough set, feature selection, attribute reduction, level rule match
PDF Full Text Request
Related items