Font Size: a A A

Research On Content Based Spam Short Messages Identifying

Posted on:2015-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:N MaFull Text:PDF
GTID:2298330467463211Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With mobile phones abundantly available, short message service, for users, is a double-edged sword. It not only brings the convenience to our daily life, but also plagues the life with spam messages and even impacts on the stability and solidarity of the entire society. Therefore, to govern and identify spam messages is quite urgent, to make sure that users of mobile phones are in the clean environment of short messages.This paper mainly studies identifying methods of spam short messages based on content, so it designs two kinds of spam identifying system, including based on decision tree and based on support vector machine. The main work of the paper included the introduction of technical background knowledge, SMS corpus collection to build sample library and analysis, designing and implementing the spam SMS identifying systems based on decision tree and support vector machine, and the analysis of comparative experiments etc.In the research of spam identifying system based on decision tree, in order to meet the identifying system’s demand that is "would rather let the spam go, can’t victimize the normal", so this paper not only combined the traditional SMS features with the new feature of normal SMS keyword, but also replaced the feature of dangerous punctuation with the rate of unusual characteristic, to reduce the risk of high false positives. Accordingly, this paper designed and implemented this identifying system based on decision tree, and conducted experiments with five comparative groups.In the research of spam identifying system based on support vector machine, this paper innovatively put forward the feature combination of first-order word features and second-order word features, in order to improve the accuracy of the identifying system only by the simple first-order words as features. Accordingly, this paper designed and implemented this identifying system based on support vector machine, and conducted experiments with nine comparative groups.The results of these experiments showed that all of the new features put forward by this paper were able to improve the performance of existing spam identifying system.
Keywords/Search Tags:spam short message, identifying, text categorization, decision tree, support vector machine
PDF Full Text Request
Related items