Font Size: a A A

Behavior Based Spam Detection And Analysis In Online Social Network

Posted on:2012-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HuFull Text:PDF
GTID:1118330368984116Subject:Information security
Abstract/Summary:PDF Full Text Request
As web 2.0 develops, Online social networks (OSNs) become popular collaboration and communication tools for millions of Internet users. Unfortunately, recent evidence shows that these trusted communities can become effective mechanisms for spreading malware and phishing attacks. Popular OSNs are increasingly becoming the target of phishing attacks launched from large botnets, and OSN account credentials are already being sold online in underground forums. Spammers begin to use different electronic medias for spam propogation and the spam technique becomes more and more complicated. The spammers try to avoid detection through expanding propogation in different ways. Such as changing the platform from tranditional Email to Online Social Network, Instant Message, Mobile Phones, Online gaming and blogs etc. And spammers make the form of the spam more diverse, from text to image and attachments. Therefore, the solutions and related research of these security problems are valuable and significant.The paper measures and analyzes attempts to spread malicious content on OSNs. The work is based on a large dataset of "wall" messages from Facebook. Wall posts are the primary form of communication on Facebook, where a user can leave messages on the public profile of a friend.Wall messages remain on a user's profile unless explicitly removed by the owner. As such, wall messages are the intuitive place to look for attempts to spread malicious content on Facebook since the messages are persistent and public, i.e. likely to be viewed by the target user and potentially the target's friends. Through crawls of several Facebook regional networks conducted in 2009, a large anonymized dataset of Facebook users, their friendship relationships, and 1.5 year long histories of wall posts for each user were obtained. In total, our dataset contains over 187 million wall posts received by 3.5 million users. The study of Facebook wall posts contains two key phases. First, the paper analyzes all wall messages and uses a number of complementary techniques to identify attempts to spread malicious content. The work focuses the analysis on messages that contain URLs or web addresses in text form. From these messages, the correlated subsets of wall posts were produced. The algorithm models each post as a node, and creates edges connecting any two nodes referring to the same URL, or any two nodes sharing similar text content as defined by an approximate textual fingerprint. This process creates a number of connected subgraphs that partition all suspect wall messages into mutually exclusive subsets, where messages in a set are potentially related. Using dual behavioral hints of bursty activity and distributed communication, the subsets of messages that exhibit properties of malicious spam campaigns were identified. Then use several complementary mechanisms to validate the effectiveness of our technique, and show that our approach is highly effective at detecting the spread of malicious content. In our second phase, the system analyzes the characteristics of the malicious wall posts we have identified. Our results provide several interesting observations on the spread of malicious content in OSNs, and the behavior of users that spread it. The authors find that phishing is by far the most popular attack on Facebook. The authors also find that users who spread malicious content communicate using very different patterns compared to the average user, and that malicious users stand out by both the bursty nature of their wall posts, as well as their diurnal activity patterns. By studying the time-duration of malicious messages and the lifetimes of users that send them, the paper concludes that the overwhelming majority of spam messages are sent through compromised accounts, rather than fake accounts specifically created for spam delivery. Finally, the paper study the largest observed spam campaigns, and make observations about their attack goals and sales pitch.The paper presents CUD (Crowdsourcing for URL spam detection) as a supplement of existing validation tools to help us to validate the Spam URLs. CUD leverages human intelligence for URL classification through crowdsourcing. CUD crawls existing user comments about spam URLs already on the Internet, and employs sentiment analysis from nature language processing to analyze the user comments automatically for detecting spam URLs. Since CUD does not using features directly associated with the URLs and their landing pages, it is more robust when attackers change their strategies. Through evaluation, we find up to 70% of URLs have user comments online. CUD achieves an accuracy of 86.8% in terms of true positive rate with a false positive rate 0.9%. Moreover, about 75% of spam URLs CUD detects are missed by other approaches.The paper focuses on characterizing spammers by leveraging both spam payload and spam nodes traffic properties. The authors find bots send more SMTP packets than normal users. The number of the SMTP packets increases from morning to evening, while the pattern of normal users is opposite. We also find the bots in the same botnet have the same destination, they will send spam to the same IP cluster. However, different botnets have different targets.
Keywords/Search Tags:Spam, online social network, botnet, crowdsourcing, sentimental analysis
PDF Full Text Request
Related items