With rapid development of mobile communication technology mobile phone text messages become an important way of communication of people of all ages. At the same time, spam message has become a problem troubling mobile phone users and telecom operators. At present, the spam message filtering technology in general includes white and black list technology, rules-based filtering, as well as matching keywords. Content-based filtering is the main technology to resolve spam messages clasasification.In order to solve these trouble, this article text applies mining technology to the model of mobile phone to filter these spam messages. In this paper, it designs and realizes spam message duel-filtering system based on rough and KNN algorithm. The model includes the pretreatment, feature extraction, weight; attribute reduction, short message filtering and classification.This article contains the following main elements:1) Analyzing several feature selection algorithms and comparing their advantages and disadvantages through experiments.(2) Proposing a way of computing terms wight based on information gain and variance, elaborating this method.(3) Proposing a dual-filtering method. The combination of KNN and rough constitute a filter. Rough reduction algorithm lowers vector space demension, reducing the number of features and reduce vector space of test message and enhancing the speed of classification.(4) The pretreament of short message includes oddity word; mass numbers; the integration of message content and punctuation.(5) Elebrating classification criteria, assessing a classification tool through experiments, summing up result of the study and pointing out shortage and proposed some advice to improve it.Finally, assessing short message text classifer, the experiments show that this tool has higher classification accuracy and achieving the desired results of the experiment... |