Font Size: a A A

A Detection System For Bulk SMS Spam Based On Hadoop

Posted on:2014-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y D LuFull Text:PDF
GTID:2308330464461425Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Short Messaging Service (SMS) are indispensible in our lives today. Meanwhile, we have sadly witnessed the dramatic increase in the volume of mobile SMS spam along with the convenience of using SMS services. Such a huge number of SMS spam seriously harm the confidence of telecommunications service users in their service providers. Particular data mining means have been developed to detect spam. Among these early approaches mainly content-based, due to the similarity between spam e-mails and spam SMS.The deployment of All Network Multi-protocols Signaling Capture & Analyze System for Shanghai Telecom made it possible to develop a SMS Spam Detection/Filter application. We focus on the problem of identifying professional spammers based on the overall message-sending patterns. We consider professional spammers as those who have purchased a mobile communication ID, and whose sole purpose is to send large numbers of spam messages for commercial gain.The solution of the system adopts a distributed computing architecture, named Hadoop, to process such amount of SMS data. We examine the effectiveness of various content-less features that range from network to time-oriented categories, along with content-based features stored in Bloom filters, which extracted by TF-IDF from learning sets. Thus, to utilize these weak classifiers, we find a way to put them together and thus form a strong classifier, i.e. Adaboost, which automatically detect spam SMS senders after a guidance learning process.Finally, after laboratory testing with virtual machines and simulate operating in one single server, the accuracy and benchmark of the classifier as well as the boosting model are verified.-Results show that the system can relevantly correctly and objectively evaluate spam SMS senders from workday or holiday data set inputs.
Keywords/Search Tags:Short Messaging Service, Spam SMS, Hadoop, Boosting Classifiers
PDF Full Text Request
Related items