Font Size: a A A

Research On SMS Spam Filtering Based On Winnow And CAPTCHA

Posted on:2011-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2178330332458701Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of mobile communication technology, SMS has become an indispensable means of communication in life. However, the number of spam messages has been increasing and SMS spam have produced a lot of inconvenience to people's lives. Therefore, SMS spam filtering problem has become a global subject and has practical significance.SMS spam filtering techniques which based on content have been studied deeply, Winnow algrithm was improved and a new SMS filtering method based on CAPTCHA was proposed. Main tasks are as follows:1) Winnow is a classification algrithm that can be updated on line by changing weight vectors. In order to reduce training complexity of the process of classification, we added pruning process after updating weights, which can cut off the features that had little impact on the result of classification.2) Winnow classifier can be updated based on user's feedback making use of the user-interactive-learning ideas. The method proposed in this paper can find misclassification messages that have not been determined making use of the misclassification messages that have been determined by user, and correct classification errors produced before, at the same time, update the classifier rules and avoid the same mistakes in the later classification process. When updating Winnow classifier, classifier performance was improved making use of Adaboosting algrighm. Adaboosting has been improved in two aspects:in order to solve the degradation problem, the sample weights were adjusted in the internal of the sample set; in order to solve the asymmetry problem of normal messages and SMS spam, the weighted coefficient of classifier was modified.3) Multi-classifier-model was proposed and SMS were classified based on the results of all classifiers. Different feature sets were trained by the same training set, and each feature set can train a different classifier. Two-Winnow-classifier has been realized.4) In order to filter the SMS spam which are from computer program, a method that based on CAPTCHA(Completely Automated Public Turing Test to Tell Computers and Human Apart) was proposed by this paper. At the same time, a new CAPTCHA method that based on image recognition was proposed.
Keywords/Search Tags:Spam Messages, Winnow Algorithm, Pruning, Interactive Learning, Multi-Classifier-Model, CAPTCHA, Picture Recognition
PDF Full Text Request
Related items