Font Size: a A A

Securing Large Cellular Networks via A Data Oriented Approach: Applications to SMS Spam and Voice Fraud Defenses

Posted on:2014-01-14Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Jiang, NanFull Text:PDF
GTID:2458390005487303Subject:Computer Science
Abstract/Summary:
In this thesis, we share our experience and approach in building operational defense systems against SMS spam and voice fraud in large-scale cellular networks. Our approach is data oriented, i.e., we collect real data from a large national cellular network and exert significant efforts in analyzing and making sense of the data, especially to understand the characteristics of fraudsters and the communication patterns between fraudsters and victims. On top of the data analysis results, we can identify the best predictive features that can alert us of emerging fraud activities. Usually, these features represent unwanted communication patterns which are derived from the original feature space. Using these features, we apply advanced machine learning techniques to train accurate detection models. To ensure the validity of the proposed approaches, we build and deploy the defense systems in operational cellular networks and carry out both extensive off-line evaluation and long-term online trial. To evaluate the system performance, we adopt both direct measurement using known fraudster blacklist provided by fraud agents and indirect measurement by monitoring the change of victim report rates. In both problems, the proposed approaches demonstrate promising results which outperform customer feedback based defenses that have been widely adopted by cellular carriers today.;More specifically, using a year (June 2011 to May 2012) of user reported SMS spam messages together with SMS network records collected from a large US based cellular carrier, we carry out a comprehensive study of SMS spamming. Our analysis shows various characteristics of SMS spamming activities. and also reveals that spam numbers with similar content exhibit strong similarity in terms of their sending patterns, tenure, devices and geolocations. Using the insights we have learned from our analysis, we propose several novel spam defense solutions. For example, we devise a novel algorithm for detecting related spam numbers. The algorithm incorporates user spam reports and identifies additional (unreported) spam number candidates which exhibit similar sending patterns at the same network location of the reported spam number during the nearby time period. The algorithm yields a high accuracy of 99.4% on real network data. Moreover, 72% of these spam numbers are detected at least 10 hours before user reports.;From a different angle, we present the design of Greystar, a defense solution against the growing SMS spam traffic in cellular networks. By exploiting the fact that most SMS spammers select targets randomly from the finite phone number space, Greystar monitors phone numbers from the gray phone space (which are associated with data only devices like data cards and modems and machine-to-machine communication devices like point-of-sale machines and electricity meters) to alert emerging spamming activities. Greystar employs a novel statistical model for detecting spam numbers based on their footprints on the gray phone space. Evaluation using five month SMS call detail records from a large US cellular carrier shows that Greystar can detect thousands of spam numbers each month with very few false alarms and 15% of the detected spam numbers have never been reported by spam recipients. Moreover, Greystar is much faster than victim spam reports. By deploying Greystar we can reduce 75% spam messages during peak hours.;To defend against voice-related fraud activities, we develop a novel methodology for detecting voice-related fraud activities using only call records. More specifically, we advance the notion of voice call graphs to represent voice calls from domestic callers to foreign recipients and propose a Markov Clustering based method for isolating dominant fraud activities from these international calls. Using data collected over a two year period from one of the largest cellular networks in the US, we evaluate the efficacy of the proposed fraud detection algorithm and conduct systematic analysis of the identified fraud activities. Our work sheds light on the unique characteristics and trends of fraud activities in cellular networks, and provides guidance on improving and securing hardware/software architecture to prevent these fraud activities. (Abstract shortened by UMI.).
Keywords/Search Tags:SMS, Fraud, Cellular networks, Data, Defense, Voice, Approach, Large
Related items